Performance issue initializing new Dictionary's

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance issue initializing new Dictionary's

Chris Muller-3
Hi all,

I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization.  In trunk, see?

[Dictionary new] bench.        '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'

[Dictionary new: 4] bench.       '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'

So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

       Dictionary new: 4    "try to debug-it and step into"

The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:.  Levente?

Best,
  Chris


Reply | Threaded
Open this post in threaded view
|

Re: Performance issue initializing new Dictionary's

Nicolas Cellier
Hi Chris,
The purpose of good prime sizes is to accelerate random access at: and at:put:. So more ellaborate bench should be used to measure if worth.

Le mar. 17 déc. 2019 à 23:02, Chris Muller <[hidden email]> a écrit :
Hi all,

I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization.  In trunk, see?

[Dictionary new] bench.        '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'

[Dictionary new: 4] bench.       '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'

So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

       Dictionary new: 4    "try to debug-it and step into"

The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:.  Levente?

Best,
  Chris



Reply | Threaded
Open this post in threaded view
|

Re: Performance issue initializing new Dictionary's

Chris Muller-3
I know that, but I'm talking about allocation speed, not access speed.  This #new: violates the expectations of your typical Smalltalk programmer, they'll write code they think is optimized but, actually, is much much slower.

Surely there's a faster way to come up with a next-higher prime?  Certainly, that primes table seems unnecessarily large for this purpose -- anyone creating a Dictionary with > 1M preallocated slots is a special-case scenario we should not slow down the "normal" cases for...

 - Chris

On Tue, Dec 17, 2019 at 4:33 PM Nicolas Cellier <[hidden email]> wrote:
Hi Chris,
The purpose of good prime sizes is to accelerate random access at: and at:put:. So more ellaborate bench should be used to measure if worth.

Le mar. 17 déc. 2019 à 23:02, Chris Muller <[hidden email]> a écrit :
Hi all,

I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization.  In trunk, see?

[Dictionary new] bench.        '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'

[Dictionary new: 4] bench.       '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'

So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

       Dictionary new: 4    "try to debug-it and step into"

The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:.  Levente?

Best,
  Chris




Reply | Threaded
Open this post in threaded view
|

Re: Performance issue initializing new Dictionary's

Levente Uzonyi
In reply to this post by Chris Muller-3
Hi Chris,

On Tue, 17 Dec 2019, Chris Muller wrote:

> Hi all,
> I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization.  In trunk, see?
>
> [Dictionary new] bench.        '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'
>
> [Dictionary new: 4] bench.       '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'
>
> So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

You can't beat #new with #new: if you need to store <= 3 elements, because
#new has the capacity of 5 hard-coded, while #new: has to figure out the
correct capacity.
If you intend to store 4 pairs in your dictionary, then it's still worth
to use #new: instead of #new, because growing costs a lot more than
allocating well-sized collections[1].

As you found out, the performance difference between #new and #new: boils
down to #goodPrimesAtLeast:. But #sizeFor:, and allocation (7 slots
instead of 5) make a difference as well.

I just pushed Collections-ul.867 to the inbox, which should help with
your problem, but this still won't make [ Dictionary new: 4 ] as fast as [
Dictionary new ]. Why?
Even though these changes make #goodPrimesAtLeast: ~8x faster for your
case, #new: will still take about two times as long as #new.


Levente

[1]
[ Dictionary new
  at: 1 put: 1;
  at: 2 put: 2;
  at: 3 put: 3;
  at: 4 put: 4 ] bench. '2,110,000 per second. 474 nanoseconds per run. 4.87805 % GC time.'.
[ (Dictionary new: 4)
  at: 1 put: 1;
  at: 2 put: 2;
  at: 3 put: 3;
  at: 4 put: 4 ] bench. '3,580,000 per second. 279 nanoseconds per run. 5.19896 % GC time.'

>
>        Dictionary new: 4    "try to debug-it and step into"
>
> The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:.  Levente?
>
> Best,
>   Chris
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Performance issue initializing new Dictionary's

Chris Muller-4
You're right.  I was actually dealing with an OrderedDictionary and got too comfortable seeing it through a "SequenceableCollection" lens, forgot about the hashing side and the need for growTo to calculate a goodPrime anyway.  That clears up my confusion, thanks.    :)

So, goodPrime is a trade-off between allocation and access performance.  I definitely like your optimization to favor access.

goodPrime is an unvaoidable cost, and your Collections-ul.867 looks like a 3X improvement (for my Dictionary example), and low risk.  Worth including in 5.3, IMO..

Thanks,
  Chris


On Tue, Dec 17, 2019 at 7:39 PM Levente Uzonyi <[hidden email]> wrote:
Hi Chris,

On Tue, 17 Dec 2019, Chris Muller wrote:

> Hi all,
> I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization.  In trunk, see?
>
> [Dictionary new] bench.        '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'
>
> [Dictionary new: 4] bench.       '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'
>
> So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

You can't beat #new with #new: if you need to store <= 3 elements, because
#new has the capacity of 5 hard-coded, while #new: has to figure out the
correct capacity.
If you intend to store 4 pairs in your dictionary, then it's still worth
to use #new: instead of #new, because growing costs a lot more than
allocating well-sized collections[1].

As you found out, the performance difference between #new and #new: boils
down to #goodPrimesAtLeast:. But #sizeFor:, and allocation (7 slots
instead of 5) make a difference as well.

I just pushed Collections-ul.867 to the inbox, which should help with
your problem, but this still won't make [ Dictionary new: 4 ] as fast as [
Dictionary new ]. Why?
Even though these changes make #goodPrimesAtLeast: ~8x faster for your
case, #new: will still take about two times as long as #new.


Levente

[1]
[ Dictionary new
        at: 1 put: 1;
        at: 2 put: 2;
        at: 3 put: 3;
        at: 4 put: 4 ] bench. '2,110,000 per second. 474 nanoseconds per run. 4.87805 % GC time.'.
[ (Dictionary new: 4)
        at: 1 put: 1;
        at: 2 put: 2;
        at: 3 put: 3;
        at: 4 put: 4 ] bench. '3,580,000 per second. 279 nanoseconds per run. 5.19896 % GC time.'

>
>        Dictionary new: 4    "try to debug-it and step into"
>
> The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:.  Levente?
>
> Best,
>   Chris
>
>