Smalltalk › Squeak › Squeak - Dev

Performance issue initializing new Dictionary's

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Chris Muller-3

Performance issue initializing new Dictionary's

Hi all,

I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization. In trunk, see?

[Dictionary new] bench. '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'

[Dictionary new: 4] bench. '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'

So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

Dictionary new: 4 "try to debug-it and step into"

The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:. Levente?

Best,

Chris

Nicolas Cellier

Re: Performance issue initializing new Dictionary's

Hi Chris,

The purpose of good prime sizes is to accelerate random access at: and at:put:. So more ellaborate bench should be used to measure if worth.

Le mar. 17 déc. 2019 à 23:02, Chris Muller <[hidden email]> a écrit :

Hi all,

I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization. In trunk, see?

[Dictionary new] bench. '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'

[Dictionary new: 4] bench. '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'

So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

Dictionary new: 4 "try to debug-it and step into"

The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:. Levente?

Best,
Chris

Chris Muller-3

Re: Performance issue initializing new Dictionary's

I know that, but I'm talking about allocation speed, not access speed. This #new: violates the expectations of your typical Smalltalk programmer, they'll write code they think is optimized but, actually, is much much slower.

Surely there's a faster way to come up with a next-higher prime? Certainly, that primes table seems unnecessarily large for this purpose -- anyone creating a Dictionary with > 1M preallocated slots is a special-case scenario we should not slow down the "normal" cases for...

- Chris

On Tue, Dec 17, 2019 at 4:33 PM Nicolas Cellier <[hidden email]> wrote:

Hi Chris,
The purpose of good prime sizes is to accelerate random access at: and at:put:. So more ellaborate bench should be used to measure if worth.

Le mar. 17 déc. 2019 à 23:02, Chris Muller <[hidden email]> a écrit :
Hi all,

I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization. In trunk, see?

[Dictionary new] bench. '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'

[Dictionary new: 4] bench. '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'

So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

Dictionary new: 4 "try to debug-it and step into"

The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:. Levente?

Best,
Chris

Levente Uzonyi

Re: Performance issue initializing new Dictionary's

In reply to this post by Chris Muller-3

Hi Chris,

On Tue, 17 Dec 2019, Chris Muller wrote:

> Hi all,
> I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization. In trunk, see?
>
> [Dictionary new] bench. '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'
>
> [Dictionary new: 4] bench. '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'
>
> So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

You can't beat #new with #new: if you need to store <= 3 elements, because
#new has the capacity of 5 hard-coded, while #new: has to figure out the
correct capacity.
If you intend to store 4 pairs in your dictionary, then it's still worth
to use #new: instead of #new, because growing costs a lot more than
allocating well-sized collections[1].

As you found out, the performance difference between #new and #new: boils
down to #goodPrimesAtLeast:. But #sizeFor:, and allocation (7 slots
instead of 5) make a difference as well.

I just pushed Collections-ul.867 to the inbox, which should help with
your problem, but this still won't make [ Dictionary new: 4 ] as fast as [
Dictionary new ]. Why?
Even though these changes make #goodPrimesAtLeast: ~8x faster for your
case, #new: will still take about two times as long as #new.

Levente

[1]
[ Dictionary new
at: 1 put: 1;
at: 2 put: 2;
at: 3 put: 3;
at: 4 put: 4 ] bench. '2,110,000 per second. 474 nanoseconds per run. 4.87805 % GC time.'.
[ (Dictionary new: 4)
at: 1 put: 1;
at: 2 put: 2;
at: 3 put: 3;
at: 4 put: 4 ] bench. '3,580,000 per second. 279 nanoseconds per run. 5.19896 % GC time.'

>
> Dictionary new: 4 "try to debug-it and step into"
>
> The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:. Levente?
>
> Best,
> Chris
>
>

Chris Muller-4

Re: Performance issue initializing new Dictionary's

You're right. I was actually dealing with an OrderedDictionary and got too comfortable seeing it through a "SequenceableCollection" lens, forgot about the hashing side and the need for growTo to calculate a goodPrime anyway. That clears up my confusion, thanks. :)

So, goodPrime is a trade-off between allocation and access performance. I definitely like your optimization to favor access.

goodPrime is an unvaoidable cost, and your Collections-ul.867 looks like a 3X improvement (for my Dictionary example), and low risk. Worth including in 5.3, IMO..

Thanks,

Chris

On Tue, Dec 17, 2019 at 7:39 PM Levente Uzonyi <[hidden email]> wrote:

Hi Chris,

On Tue, 17 Dec 2019, Chris Muller wrote:

> Hi all,
> I just noticed that attempting to optimize the creation of a Dictionary or OrderedDictionary by specifying a pre-allocated size, actually slows the system down more than not using this optimization. In trunk, see?
>
> [Dictionary new] bench. '26,300,000 per second. 38.1 nanoseconds per run. 9.21816 % GC time.'
>
> [Dictionary new: 4] bench. '3,560,000 per second. 281 nanoseconds per run. 2.41952 % GC time.'
>
> So then I tried to debug it to see why, and I couldn't without my Tools-cmm.926 patch currently in the Inbox for review.

You can't beat #new with #new: if you need to store <= 3 elements, because
#new has the capacity of 5 hard-coded, while #new: has to figure out the
correct capacity.
If you intend to store 4 pairs in your dictionary, then it's still worth
to use #new: instead of #new, because growing costs a lot more than
allocating well-sized collections[1].

As you found out, the performance difference between #new and #new: boils
down to #goodPrimesAtLeast:. But #sizeFor:, and allocation (7 slots
instead of 5) make a difference as well.

I just pushed Collections-ul.867 to the inbox, which should help with
your problem, but this still won't make [ Dictionary new: 4 ] as fast as [
Dictionary new ]. Why?
Even though these changes make #goodPrimesAtLeast: ~8x faster for your
case, #new: will still take about two times as long as #new.

Levente

[1]
[ Dictionary new
at: 1 put: 1;
at: 2 put: 2;
at: 3 put: 3;
at: 4 put: 4 ] bench. '2,110,000 per second. 474 nanoseconds per run. 4.87805 % GC time.'.
[ (Dictionary new: 4)
at: 1 put: 1;
at: 2 put: 2;
at: 3 put: 3;
at: 4 put: 4 ] bench. '3,580,000 per second. 279 nanoseconds per run. 5.19896 % GC time.'

>
> Dictionary new: 4 "try to debug-it and step into"
>
> The whole purpose of #new: over #new is to increase the performance of allocation when the minimum size is known in advance, but we seem to have killed this goal by the cost of #goodPrimeAtLeast:. Levente?
>
> Best,
> Chris
>
>