The Inbox: Collections-cmm.874.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

The Inbox: Collections-cmm.874.mcz

commits-2
Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.874.mcz

==================== Summary ====================

Name: Collections-cmm.874
Author: cmm
Time: 24 January 2020, 3:32:43.628249 pm
UUID: 107f3a74-177c-4338-9ad3-648e427419a1
Ancestors: Collections-ul.871

- Optimize for system compactness by ensuring the default internal array size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and perform comparably with #new even if the minimum size is specified.

=============== Diff against Collections-ul.871 ===============

Item was changed:
  ----- Method: HashedCollection class>>new (in category 'instance creation') -----
  new
+ ^ self basicNew initialize: 3!
- ^ self basicNew initialize: 5!


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-3
Hi all,

In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.  

     ((Dictionary allInstances count: [ : e | e size < 3 and: [ e array size >= 5 ] ]) / Dictionary allInstances size) asFloat      "0.11282051282051282"

A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new".  The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.

This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.

Best,
  Chris




On Fri, Jan 24, 2020 at 3:32 PM <[hidden email]> wrote:
Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.874.mcz

==================== Summary ====================

Name: Collections-cmm.874
Author: cmm
Time: 24 January 2020, 3:32:43.628249 pm
UUID: 107f3a74-177c-4338-9ad3-648e427419a1
Ancestors: Collections-ul.871

- Optimize for system compactness by ensuring the default internal array size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and perform comparably with #new even if the minimum size is specified.

=============== Diff against Collections-ul.871 ===============

Item was changed:
  ----- Method: HashedCollection class>>new (in category 'instance creation') -----
  new
+       ^ self basicNew initialize: 3!
-       ^ self basicNew initialize: 5!




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-3
Wow, the number of oversized OrderedCollection instances is much worse, 92%.

((OrderedCollection allInstances count: [ : e | e size < 10 and: [ e array size >= 10 ] ]) /
OrderedCollection allInstances size) asFloat.

On Fri, Jan 24, 2020 at 4:01 PM Chris Muller <[hidden email]> wrote:
Hi all,

In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.  

     ((Dictionary allInstances count: [ : e | e size < 3 and: [ e array size >= 5 ] ]) / Dictionary allInstances size) asFloat      "0.11282051282051282"

A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new".  The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.

This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.

Best,
  Chris




On Fri, Jan 24, 2020 at 3:32 PM <[hidden email]> wrote:
Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.874.mcz

==================== Summary ====================

Name: Collections-cmm.874
Author: cmm
Time: 24 January 2020, 3:32:43.628249 pm
UUID: 107f3a74-177c-4338-9ad3-648e427419a1
Ancestors: Collections-ul.871

- Optimize for system compactness by ensuring the default internal array size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and perform comparably with #new even if the minimum size is specified.

=============== Diff against Collections-ul.871 ===============

Item was changed:
  ----- Method: HashedCollection class>>new (in category 'instance creation') -----
  new
+       ^ self basicNew initialize: 3!
-       ^ self basicNew initialize: 5!




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Jakob Reschke
Note that your statistics do not account for transient collections. If minimizing the initial capacity made most uses of these collections slower, it might not be justified to do so.

Am Fr., 24. Jan. 2020 um 23:28 Uhr schrieb Chris Muller <[hidden email]>:
Wow, the number of oversized OrderedCollection instances is much worse, 92%.

((OrderedCollection allInstances count: [ : e | e size < 10 and: [ e array size >= 10 ] ]) /
OrderedCollection allInstances size) asFloat.

On Fri, Jan 24, 2020 at 4:01 PM Chris Muller <[hidden email]> wrote:
Hi all,

In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.  

     ((Dictionary allInstances count: [ : e | e size < 3 and: [ e array size >= 5 ] ]) / Dictionary allInstances size) asFloat      "0.11282051282051282"

A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new".  The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.

This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.

Best,
  Chris




On Fri, Jan 24, 2020 at 3:32 PM <[hidden email]> wrote:
Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.874.mcz

==================== Summary ====================

Name: Collections-cmm.874
Author: cmm
Time: 24 January 2020, 3:32:43.628249 pm
UUID: 107f3a74-177c-4338-9ad3-648e427419a1
Ancestors: Collections-ul.871

- Optimize for system compactness by ensuring the default internal array size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and perform comparably with #new even if the minimum size is specified.

=============== Diff against Collections-ul.871 ===============

Item was changed:
  ----- Method: HashedCollection class>>new (in category 'instance creation') -----
  new
+       ^ self basicNew initialize: 3!
-       ^ self basicNew initialize: 5!





Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-4
Hi Jakob,

Optimal execution performance can never be assured with #new, since the potential cost of under-allocation in various places is offset by the gains of not over-allocating in other places. 

Although it's impossible to tweak the default initial size in #new to optimize for performance, we can at least optimize for space, and also for clarity-of-usage.  #new, by nature, expresses a "lack of concern for optimization", so Squeak should at least provide compactness in that case.  Places in the code that need optimization will correctly express that by writing #new: with a larger pre-allocation.

It's true that system level changes of any kind could result in the need for downstream changes but, in this case, it seems very unlikely.

Best,
  Chris


On Fri, Jan 24, 2020 at 4:51 PM Jakob Reschke <[hidden email]> wrote:
Note that your statistics do not account for transient collections. If minimizing the initial capacity made most uses of these collections slower, it might not be justified to do so.

Am Fr., 24. Jan. 2020 um 23:28 Uhr schrieb Chris Muller <[hidden email]>:
Wow, the number of oversized OrderedCollection instances is much worse, 92%.

((OrderedCollection allInstances count: [ : e | e size < 10 and: [ e array size >= 10 ] ]) /
OrderedCollection allInstances size) asFloat.

On Fri, Jan 24, 2020 at 4:01 PM Chris Muller <[hidden email]> wrote:
Hi all,

In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.  

     ((Dictionary allInstances count: [ : e | e size < 3 and: [ e array size >= 5 ] ]) / Dictionary allInstances size) asFloat      "0.11282051282051282"

A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new".  The most compact code should result in the most compact size by default, and only if a *larger* default size is desired, use #new: for optimization.

This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.

Best,
  Chris




On Fri, Jan 24, 2020 at 3:32 PM <[hidden email]> wrote:
Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.874.mcz

==================== Summary ====================

Name: Collections-cmm.874
Author: cmm
Time: 24 January 2020, 3:32:43.628249 pm
UUID: 107f3a74-177c-4338-9ad3-648e427419a1
Ancestors: Collections-ul.871

- Optimize for system compactness by ensuring the default internal array size of any HashedCollection is not initialized larger than it may ever need to be.
- Let #new: be used to define larger sizes than the minimum, and perform comparably with #new even if the minimum size is specified.

=============== Diff against Collections-ul.871 ===============

Item was changed:
  ----- Method: HashedCollection class>>new (in category 'instance creation') -----
  new
+       ^ self basicNew initialize: 3!
-       ^ self basicNew initialize: 5!





Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Levente Uzonyi
Hi Chris,

On Fri, 24 Jan 2020, Chris Muller wrote:

> Hi Jakob,
>
> Optimal execution performance can never be assured with #new, since the potential cost of under-allocation in various places is offset by the gains of not over-allocating in other places. 
>
> Although it's impossible to tweak the default initial size in #new to optimize for performance, we can at least optimize for space, and also for clarity-of-usage.  #new, by nature, expresses a "lack of concern for optimization", so Squeak should at least provide compactness in that case.  Places in the code that
> need optimization will correctly express that by writing #new: with a larger pre-allocation.
>
> It's true that system level changes of any kind could result in the need for downstream changes but, in this case, it seems very unlikely.

The way I understand it, Jakob says that your suggested change has a
chance to slow down things just to save a few hundred kilobytes of memory.
I don't think we would want to do that without measuring the effects of
the change somehow.

Same applies to OrderedCollections.

Following your reasoning, the optimal initial capacity for any dynamic
collection created with #new should be 0, but that doesn't feel right...


Levente

>
> Best,
>   Chris
>
>
> On Fri, Jan 24, 2020 at 4:51 PM Jakob Reschke <[hidden email]> wrote:
>       Note that your statistics do not account for transient collections. If minimizing the initial capacity made most uses of these collections slower, it might not be justified to do so.
>
> Am Fr., 24. Jan. 2020 um 23:28 Uhr schrieb Chris Muller <[hidden email]>:
>       Wow, the number of oversized OrderedCollection instances is much worse, 92%.
> ((OrderedCollection allInstances count: [ : e | e size < 10 and: [ e array size >= 10 ] ]) /
> OrderedCollection allInstances size) asFloat.
>
> On Fri, Jan 24, 2020 at 4:01 PM Chris Muller <[hidden email]> wrote:
>       Hi all,
> In my trunk image, currently >10% of Dictionary instances have unnecessarily large internal arrays because they were created with #new but never grew.  
>      ((Dictionary allInstances count: [ : e | e size < 3 and: [ e array size >= 5 ] ]) / Dictionary allInstances size) asFloat      "0.11282051282051282"
>
> A developer wanting compactness should not be required to probe into the internal implementation to know whether they can write the most compact code, "Dictionary new" and "Set new".  The most compact code should result in the most compact size by default, and only if a *larger* default size is
> desired, use #new: for optimization.
>
> This also addresses the issue I've been discussing with Levente, ensuring #new: maintains its performance optimization along with space, as with OrderedCollections, etc.
>
> Best,
>   Chris
>
>
>
>
> On Fri, Jan 24, 2020 at 3:32 PM <[hidden email]> wrote:
>       Chris Muller uploaded a new version of Collections to project The Inbox:
>       http://source.squeak.org/inbox/Collections-cmm.874.mcz
>
>       ==================== Summary ====================
>
>       Name: Collections-cmm.874
>       Author: cmm
>       Time: 24 January 2020, 3:32:43.628249 pm
>       UUID: 107f3a74-177c-4338-9ad3-648e427419a1
>       Ancestors: Collections-ul.871
>
>       - Optimize for system compactness by ensuring the default internal array size of any HashedCollection is not initialized larger than it may ever need to be.
>       - Let #new: be used to define larger sizes than the minimum, and perform comparably with #new even if the minimum size is specified.
>
>       =============== Diff against Collections-ul.871 ===============
>
>       Item was changed:
>         ----- Method: HashedCollection class>>new (in category 'instance creation') -----
>         new
>       +       ^ self basicNew initialize: 3!
>       -       ^ self basicNew initialize: 5!
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-3
On Fri, Jan 24, 2020 at 6:19 PM Levente Uzonyi <[hidden email]> wrote:
Hi Chris,

On Fri, 24 Jan 2020, Chris Muller wrote:

> Hi Jakob,
>
> Optimal execution performance can never be assured with #new, since the potential cost of under-allocation in various places is offset by the gains of not over-allocating in other places. 
>
> Although it's impossible to tweak the default initial size in #new to optimize for performance, we can at least optimize for space, and also for clarity-of-usage.  #new, by nature, expresses a "lack of concern for optimization", so Squeak should at least provide compactness in that case.  Places in the code that
> need optimization will correctly express that by writing #new: with a larger pre-allocation.
>
> It's true that system level changes of any kind could result in the need for downstream changes but, in this case, it seems very unlikely.

The way I understand it, Jakob says that your suggested change has a
chance to slow down things just to save a few hundred kilobytes of memory.

But your suggested change is 100% *guaranteed* to slow things down, with no space savings.

This avoids that particular slow down, while guarantee'ing to save space!  All for only a very teency-weency chance of other areas being minorly affected (and, easily fixed if they are!).
 
I don't think we would want to do that without measuring the effects of
the change somehow.

Performance measuring is already part of every system that cares, right?   :)
None of those systems are depending on any particular default initial size for performance.  If they are, they should be fixed.
 

Same applies to OrderedCollections.

Following your reasoning, the optimal initial capacity for any dynamic
collection created with #new should be 0, but that doesn't feel right...

Not 0, 3.  It's "reasoning", not extremism.  :)

3 would be a great default for OrderedCollection.  Ultra-minimal, but with a purpose.   I mean, really, with 10, we have 92% of all instances wasting space.

 - Chris



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Levente Uzonyi
Hi Chris,

On Fri, 24 Jan 2020, Chris Muller wrote:

> On Fri, Jan 24, 2020 at 6:19 PM Levente Uzonyi <[hidden email]> wrote:
>       Hi Chris,
>
>       On Fri, 24 Jan 2020, Chris Muller wrote:
>
>       > Hi Jakob,
>       >
>       > Optimal execution performance can never be assured with #new, since the potential cost of under-allocation in various places is offset by the gains of not over-allocating in other places. 
>       >
>       > Although it's impossible to tweak the default initial size in #new to optimize for performance, we can at least optimize for space, and also for clarity-of-usage.  #new, by nature, expresses a "lack of concern for optimization", so Squeak should at least provide compactness in that case.  Places in
>       the code that
>       > need optimization will correctly express that by writing #new: with a larger pre-allocation.
>       >
>       > It's true that system level changes of any kind could result in the need for downstream changes but, in this case, it seems very unlikely.
>
>       The way I understand it, Jakob says that your suggested change has a
>       chance to slow down things just to save a few hundred kilobytes of memory.
>
>
> But your suggested change is 100% *guaranteed* to slow things down, with no space savings.
Which change? To remove the optimization from #new? That wasn't a serious
suggestion just a possible solution to your problem (which I don't
really consider to be a problem).

>
> This avoids that particular slow down, while guarantee'ing to save space!  All for only a very teency-weency chance of other areas being minorly affected (and, easily fixed if they are!).

It's funny that you call it a slowdown. You may perceive it as one but
that's just a simple optimization applied to #new, which cannot
be directly applied to #new:.

>  
>       I don't think we would want to do that without measuring the effects of
>       the change somehow.
>
>
> Performance measuring is already part of every system that cares, right?   :)

Of course not. That's why "somehow" is there. I don't know how it could be
measured, but I think that without measurements, it's risky to change
something like that.

> None of those systems are depending on any particular default initial size for performance.  If they are, they should be fixed.

Here is an example scenario:
I write my code. I use [OrderedCollection new]. I see that my code is fast
enough.
You change the default capacity from 10 to 3. My code is now too slow. I
have to profile it to see why. It turns out that I store 7-9 elements most
of the time, and the capacity of 10 was a good fit, but 3 is not, because
it means growing twice (first to 6, then to 12), and my code ends up
being slower and using more memory than before.

>  
>
>       Same applies to OrderedCollections.
>
>       Following your reasoning, the optimal initial capacity for any dynamic
>       collection created with #new should be 0, but that doesn't feel right...
>
>
> Not 0, 3.  It's "reasoning", not extremism.  :)
>
> 3 would be a great default for OrderedCollection.  Ultra-minimal, but with a purpose.   I mean, really, with 10, we have 92% of all instances wasting space.
"Wasting space" is a must when you implement dynamic arrays and
hash tables. It's a space-time tradeoff[1].


Levente

[1] https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff

>
>  - Chris
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-4
>  
>       I don't think we would want to do that without measuring the effects of
>       the change somehow.
>
>
> Performance measuring is already part of every system that cares, right?   :)

Of course not. That's why "somehow" is there. I don't know how it could be
measured, but I think that without measurements, it's risky to change
something like that.

You should be careful, Levente, Collections-ul.871 does it, too.  Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3.  And, yes, it did indeed introduce this unexpected anomaly!
 

> None of those systems are depending on any particular default initial size for performance.  If they are, they should be fixed.

Here is an example scenario:
I write my code. I use [OrderedCollection new]. I see that my code is fast
enough.
You change the default capacity from 10 to 3. My code is now too slow. I
have to profile it to see why. It turns out that I store 7-9 elements most
of the time, and the capacity of 10 was a good fit, but 3 is not, because
it means growing twice (first to 6, then to 12), and my code ends up
being slower and using more memory than before.

This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance.  By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be.  No amount of "guessing" of an initial size will help execution performance, but could at least guarantee space efficiency.

More importantly, to me, it brings clear, definitive responsibility to #new: over #new.  Efficiency.

 

>  
>
>       Same applies to OrderedCollections.
>
>       Following your reasoning, the optimal initial capacity for any dynamic
>       collection created with #new should be 0, but that doesn't feel right...
>
>
> Not 0, 3.  It's "reasoning", not extremism.  :)
>
> 3 would be a great default for OrderedCollection.  Ultra-minimal, but with a purpose.   I mean, really, with 10, we have 92% of all instances wasting space.

"Wasting space" is a must when you implement dynamic arrays and
hash tables. It's a space-time tradeoff[1].

OrderedCollection isn't hashed.

If we want have a sliding-scale trade-off between space and performance, so that "Dictionary new" everywhere in the code could be treated to run time or space-efficiently, that'd be something I could respect.  We'd need to introduce some kind a variable or PoolDictionary constant to allow a developer to customize, for the entire image, all cases of "Dictionary new" to run space-efficiently (e.g., on a 1GB RPi) or time-efficiently.

But we should not break #new: 1 and #new: 2 for HashedCollections, please.

 - Chris
 


Levente

[1] https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Jakob Reschke
Am Sa., 25. Jan. 2020 um 06:55 Uhr schrieb Chris Muller <[hidden email]>:
Here is an example scenario:
I write my code. I use [OrderedCollection new]. I see that my code is fast
enough.
You change the default capacity from 10 to 3. My code is now too slow. I
have to profile it to see why. It turns out that I store 7-9 elements most
of the time, and the capacity of 10 was a good fit, but 3 is not, because
it means growing twice (first to 6, then to 12), and my code ends up
being slower and using more memory than before.

This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance.  By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be.  No amount of "guessing" of an initial size will help execution performance, but could at least guarantee space efficiency.

If you optimize the default for space instead of sticking with a reasonable tradeoff, you might force people to use new: and think about the very implementation details of those collections to get back to reasonable results. You might turn a piece of code into a bottleneck even though it was not considered performance-critical before. So you may break/brake existing applications in a non-obvious manner and cause maintenance cost. And free time is also precious...

On the other hand, who else was bothered by too sparse hashed or ordered collections until now? Is it a problem that bothers many, in comparison to the group which the change could bother?

I suppose this is premature optimization. If people have identified compactness as a requirement, they shall use #new: with (domain specific) expected numbers or patch #new for their application. But don't force it on everyone. You wouldn't like me to submit a "performance optimization" that changes the new default capacity to 100 because my application happened to deal with collections of that size frequently and because memory is comparably cheap and large nowadays, would you?

We can only really know the impact if we have a benchmark or even an idea of realistic average collection usage. Maybe someone wrote a paper about that...


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Jakob Reschke
In reply to this post by Chris Muller-4
But we should not break #new: 1 and #new: 2 for HashedCollections, please.

Couldn't it be faster to use an OrderedCollection instead of a hashed one for such small numbers of elements? If the hash computation outweighs the linear search...


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Tobias Pape

> On 25.01.2020, at 10:23, Jakob Reschke <[hidden email]> wrote:
>
> But we should not break #new: 1 and #new: 2 for HashedCollections, please.
>
> Couldn't it be faster to use an OrderedCollection instead of a hashed one for such small numbers of elements? If the hash computation outweighs the linear search...
>
As I said.
This is just too much premature optimization of everything.
It worked well before. If you need really small key-value-mappings, you should make one yourself.

Let's just leave the current implementation alone, please. It is a-ok.

Best regards
        -Tobias


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Squeak - Dev mailing list
Let's just leave the current implementation alone, please. It is a-ok.

+1

/*—————————————————-*/
Sent from my iPhone
See https://objectnets.net and https://objectnets.org
https://datascilv.com and https://datascilv.org

On Jan 25, 2020, at 02:50, Tobias Pape <[hidden email]> wrote:


On 25.01.2020, at 10:23, Jakob Reschke <[hidden email]> wrote:

But we should not break #new: 1 and #new: 2 for HashedCollections, please.

Couldn't it be faster to use an OrderedCollection instead of a hashed one for such small numbers of elements? If the hash computation outweighs the linear search...

As I said.
This is just too much premature optimization of everything.
It worked well before. If you need really small key-value-mappings, you should make one yourself.

Let's just leave the current implementation alone, please. It is a-ok.

Best regards
   -Tobias




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Levente Uzonyi
In reply to this post by Chris Muller-4
Hi Chris,

On Fri, 24 Jan 2020, Chris Muller wrote:

>       >  
>       >       I don't think we would want to do that without measuring the effects of
>       >       the change somehow.
>       >
>       >
>       > Performance measuring is already part of every system that cares, right?   :)
>
>       Of course not. That's why "somehow" is there. I don't know how it could be
>       measured, but I think that without measurements, it's risky to change
>       something like that.
>
>
> You should be careful, Levente, Collections-ul.871 does it, too.  Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3.  And, yes, it did indeed introduce this unexpected anomaly!
When one says [Dictionary new: 1] or [Dictionary new: 2], one expects to
get a dictionary that can hold one or two elements, respectively.
Collections-ul.871, just like the former version, creates dictionaries
matching those expectations, but Dictionaries returned by the new version
use less memory.
So, what's the problem?

>  
>
>       > None of those systems are depending on any particular default initial size for performance.  If they are, they should be fixed.
>
>       Here is an example scenario:
>       I write my code. I use [OrderedCollection new]. I see that my code is fast
>       enough.
>       You change the default capacity from 10 to 3. My code is now too slow. I
>       have to profile it to see why. It turns out that I store 7-9 elements most
>       of the time, and the capacity of 10 was a good fit, but 3 is not, because
>       it means growing twice (first to 6, then to 12), and my code ends up
>       being slower and using more memory than before.
>
>
> This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance.  By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be.  No amount of "guessing" of an initial
> size will help execution performance, but could at least guarantee space efficiency.
No, the example assumes that the user knows nothing about the
internals of Dictionary. The performance of the example depends on the
inital capacity.
How often do you investigate the internals of classes your code uses to
avoid relying on the default values coded into them when there are no
performance problems with your code? I guess never...

>
> More importantly, to me, it brings clear, definitive responsibility to #new: over #new.  Efficiency.

As I wrote it many times in this thread, #new: already gives you
efficiency over #new, because it lets you avoid growing.
The optimization in HashedCollection class >> #new is what makes you think
otherwise.
The problem stems from two things:
- you doing microbenchmarks on #new: and #new, ignoring the costs of
filling up the collections
- the simple optimization in #new

>
>  
>
>       >  
>       >
>       >       Same applies to OrderedCollections.
>       >
>       >       Following your reasoning, the optimal initial capacity for any dynamic
>       >       collection created with #new should be 0, but that doesn't feel right...
>       >
>       >
>       > Not 0, 3.  It's "reasoning", not extremism.  :)
>       >
>       > 3 would be a great default for OrderedCollection.  Ultra-minimal, but with a purpose.   I mean, really, with 10, we have 92% of all instances wasting space.
>
>       "Wasting space" is a must when you implement dynamic arrays and
>       hash tables. It's a space-time tradeoff[1].
>
>
> OrderedCollection isn't hashed.
OrderedCollection is a dynamic array[1].

>
> If we want have a sliding-scale trade-off between space and performance, so that "Dictionary new" everywhere in the code could be treated to run time or space-efficiently, that'd be something I could respect.  We'd need to introduce some kind a variable or PoolDictionary constant to allow a developer to

I don't think we want anything like that.

> customize, for the entire image, all cases of "Dictionary new" to run space-efficiently (e.g., on a 1GB RPi) or time-efficiently.
>
> But we should not break #new: 1 and #new: 2 for HashedCollections, please.

The performance difference between #new, and #new: X where X is <= 3,
didn't bother you (or anyone else) until recently.
I proposed all kinds of solutions to minimize the difference between #new
and #new:, and we seemed to agree on one of those.


Levente

[1] https://en.wikipedia.org/wiki/Dynamic_array

>
>  - Chris
>  
>
>
>       Levente
>
>       [1] https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-3
Hi Levente, hi Jakob,

Thanks for the interesting discussion.  I'd like you to know, I'm not gung-ho about this change, but do think we should seriously consider it for the benefit of Squeak.  I think the benefit is real, but deceptive.

It seems there are two dimensions to the decision:

  - legacy / compatibility
  - API design / user-expectations

I do respect your point about legacy, that writing #new has always meant you get something that can hold up to 10 elements before needing to grow, instead of only 3.
It "sounds" reasonable, but...

Here are some certainties:

    - Allocating a 3 element array is quicker than allocating 10 element one.
    - 3 element Array's take up less memory than 10 element ones
    - consuming more RAM can lead to slowdowns due to paging or GC overhead
    - In the worst possible case (e.g., doing it over and over and nothing else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]

Here are some uncertainties:

     - there may be some code somewhere creating many thousands of OrderedCollections (if it were only a few, it wouldn't be noticed)
     - the many thousands are all created in a very short amount of time (if it were spread out over time, it wouldn't be noticed)
     - it then stores 7-9 elements in most of the OrderedCollections
     - in spite of all of the above, the author still wrote #new instead of #new:

Letting fear of such remote uncertainties dominate your decision even in light of those certainties brings paralysis.  Squeak can hardly improve if such low-risk items can't even be attempted at the beginning of a development cycle, when applications will surely undergo testing before deploying to the next version of Squeak (5.4).  Our community is small (low chance) and helpful (low impact).
___
That leaves the API discussion, which was mostly ignored, but is really the key.  I would like to touch on this with some responses, below:
 
> You should be careful, Levente, Collections-ul.871 does it, too.  Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3.  And, yes, it did indeed introduce this unexpected anomaly!

When one says [Dictionary new: 1] or [Dictionary new: 2], one expects to
get a dictionary that can hold one or two elements, respectively.

How many should one "expect" to get with #new?   Because one reason you gave for not wanting to reduce the default size seems to be based on some "expectation" you have for it.

There should be only one:  that it shouldn't be significantly faster than #new: 1.   Nothing else.
 
Collections-ul.871, just like the former version, creates dictionaries
matching those expectations, but Dictionaries returned by the new version
use less memory.
So, what's the problem?

It slowed down (Dictionary new: 1) relative to trunk, 
by a comparable margin as adding 9 elements to an (OrderedCollection new: 3) relative to an (OrderedCollection new: 10 
(see [1])

How often do you investigate the internals of classes your code uses to
avoid relying on the default values coded into them

The answer is already "never" right here...
 
when there are no
performance problems with your code? I guess never...

... but I still want to write "optimized" code even _before_ having any "performance problems."   Proactivity.  :)
  
> But we should not break #new: 1 and #new: 2 for HashedCollections, please.

The performance difference between #new, and #new: X where X is <= 3,
didn't bother you (or anyone else) until recently.

Because the problem is in proposal Collections-ul.871, not trunk.
 
I proposed all kinds of solutions to minimize the difference between #new
and #new:, and we seemed to agree on one of those.

Cool, let's do one of those, thanks!

 - Chris
 
[1] --         (adding 9 elements to OrderedCollection new: 10)
                100% of baseline rate, 3,860,000 per second. 259 nanoseconds per run. 2.88 % GC time.'

                (adding 9 elements to OrderedCollection new: 3)
                72% of baseline rate, 2,790,000 per second. 359 nanoseconds per run. 3.55929 % GC time.

                (Dictionary new: 1) in trunk
                21,800,000 per second. 45.9 nanoseconds per run. 13.8 % GC time.

                 (Dicitionary new: 1) with Collections-ul.871
                 '16,300,000 per second. 61.4 nanoseconds per run. 5.05899 % GC time.'


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-3
In reply to this post by Jakob Reschke
Hi Jakob,

My main reply is in the other one to you and Levente, but some quick responses here for minor embellishment.  :)

On Sat, Jan 25, 2020 at 3:21 AM Jakob Reschke <[hidden email]> wrote:
Am Sa., 25. Jan. 2020 um 06:55 Uhr schrieb Chris Muller <[hidden email]>:
Here is an example scenario:
I write my code. I use [OrderedCollection new]. I see that my code is fast
enough.
You change the default capacity from 10 to 3. My code is now too slow. I
have to profile it to see why. It turns out that I store 7-9 elements most
of the time, and the capacity of 10 was a good fit, but 3 is not, because
it means growing twice (first to 6, then to 12), and my code ends up
being slower and using more memory than before.

This example makes the case for this proposal, by showing that it was *depending on knowing the private, internal initial size*, for its performance.  By having written #new instead of #new: in performance-critical code, it was and still is less efficient than it could be.  No amount of "guessing" of an initial size will help execution performance, but could at least guarantee space efficiency.

If you optimize the default for space instead of sticking with a reasonable tradeoff,

"reasonable tradeoff" is what I'm trying to convince you is completely baseless.
 
you might force people to use new: and think about the very implementation details of those collections to get back to reasonable results.

Its no different than we have now.  Thinking about the size wherever you can is a good thing.

Your fear of changing #new is because of the fuzziness of its definition.  What you're calling "reasonable" is actually just "random".  If it were definitive (e.g., space-efficient), the impact of changing it would be, too.

You might turn a piece of code into a bottleneck even though it was not considered performance-critical before.

Or it might rescue a suffering application because it's no longer paging RAM out to disk...  :)

On the other hand, who else was bothered by too sparse hashed or ordered collections until now?

It's about designing the most-efficient system and the best API, not who has been bothered yet.
 
Is it a problem that bothers many, in comparison to the group which the change could bother?

What happens to that group when they move their code to another Smalltalk which uses a different default?
 
I suppose this is premature optimization. If people have identified compactness as a requirement,

When all else is equal, more compact is always better than less.
 
they shall use #new: with (domain specific) expected numbers or patch #new for their application. But don't force it on everyone.

Patching #new and then using it because you patched it is a ridiculous suggestion.  That's what #new: is for.  This is about Squeak, not any one app..  10 is currently "forced" on everyone, and with 92% of OrderedCollections in trunk over-allocated, a smaller choice might be better...
 
You wouldn't like me to submit a "performance optimization" that changes the new default capacity to 100 because my application happened to deal with collections of that size frequently and because memory is comparably cheap and large nowadays, would you?

It's no less arbitrary than 10.  Both guarantee nothing.  At least 1, 2, or 3 guarantees space efficiency, and guarantees to make the API more definitive.

#new is fuzzy.  The whole reason you're worried about uses of #new being affected at all is because of that fuzziness.  We should give it clarity, make it definitively space-efficient...

The core library cannot possibly guess the shape of people's domains.  Our attempts to do so are causing more harm than good..
 
We can only really know the impact if we have a benchmark or even an idea of realistic average collection usage. Maybe someone wrote a paper about that...

The beginning of a dev cycle is where such a change can be implemented, leaving plenty of time for testing.

> Couldn't it be faster to use an OrderedCollection instead of a hashed one for such small numbers of elements? If the hash computation outweighs the linear search...
As mentioned, this is about Squeak system efficiency and API design, not any one specific app.

 - Chris



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Nicolas Cellier
In reply to this post by Chris Muller-3

Hi Chris,

Le mer. 29 janv. 2020 à 09:39, Chris Muller <[hidden email]> a écrit :
Hi Levente, hi Jakob,

Thanks for the interesting discussion.  I'd like you to know, I'm not gung-ho about this change, but do think we should seriously consider it for the benefit of Squeak.  I think the benefit is real, but deceptive.

It seems there are two dimensions to the decision:

  - legacy / compatibility
  - API design / user-expectations

I do respect your point about legacy, that writing #new has always meant you get something that can hold up to 10 elements before needing to grow, instead of only 3.
It "sounds" reasonable, but...

Here are some certainties:

    - Allocating a 3 element array is quicker than allocating 10 element one.

Hmm not sure about that. Not for single or few objects.
Allocating many larger short lived objects will increase the rate of scavenging statistically, but this will be measurable only is allocating massively this kind of objects IMO.

    - 3 element Array's take up less memory than 10 element ones
    - consuming more RAM can lead to slowdowns due to paging or GC overhead
    - In the worst possible case (e.g., doing it over and over and nothing else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]


Typical Smalltalk objects are short lived.
That's why we get a generational garbage collection.
So typical usage is unlikely to generate paging.
Case of paging can only occur if many of these objects are longer lived (and tenured), which again is not typical.
For specific usage, there might be specific optimizations like growing a bit the Eden.

Here are some uncertainties:

     - there may be some code somewhere creating many thousands of OrderedCollections (if it were only a few, it wouldn't be noticed)
     - the many thousands are all created in a very short amount of time (if it were spread out over time, it wouldn't be noticed)
     - it then stores 7-9 elements in most of the OrderedCollections
     - in spite of all of the above, the author still wrote #new instead of #new:


All your analysis is exclusively focused on a specific un-typical usage of the library...
You are then trying to bend the general purpose library tothis specific case.
IMO this should be handled with a specific optimization.

I did propose using something like a SmallDictionary that is WAY more optimized for small size than hashed collection.
If not all your dictionary are small, you could have a subclass that switch (becomeForward:) representation when needing to grow.

Letting fear of such remote uncertainties dominate your decision even in light of those certainties brings paralysis.  Squeak can hardly improve if such low-risk items can't even be attempted at the beginning of a development cycle, when applications will surely undergo testing before deploying to the next version of Squeak (5.4).  Our community is small (low chance) and helpful (low impact).
___
That leaves the API discussion, which was mostly ignored, but is really the key.  I would like to touch on this with some responses, below:
 
> You should be careful, Levente, Collections-ul.871 does it, too.  Everywhere #new: 1 or #new: 2 occurs, you've reduced the internal array size from 5 to 3.  And, yes, it did indeed introduce this unexpected anomaly!

When one says [Dictionary new: 1] or [Dictionary new: 2], one expects to
get a dictionary that can hold one or two elements, respectively.

How many should one "expect" to get with #new?   Because one reason you gave for not wanting to reduce the default size seems to be based on some "expectation" you have for it.

There should be only one:  that it shouldn't be significantly faster than #new: 1.   Nothing else.
 
Collections-ul.871, just like the former version, creates dictionaries
matching those expectations, but Dictionaries returned by the new version
use less memory.
So, what's the problem?

It slowed down (Dictionary new: 1) relative to trunk, 
by a comparable margin as adding 9 elements to an (OrderedCollection new: 3) relative to an (OrderedCollection new: 10 
(see [1])

IMO this is typical biased usage of percentages...
Saving 30% of a short duration or 30% of a long duration is not at all the same thing!
The former case is premature optimization presumably unless used in tight loops.

How often do you investigate the internals of classes your code uses to
avoid relying on the default values coded into them

The answer is already "never" right here...
 
when there are no
performance problems with your code? I guess never...

... but I still want to write "optimized" code even _before_ having any "performance problems."   Proactivity.  :)
  
> But we should not break #new: 1 and #new: 2 for HashedCollections, please.

The performance difference between #new, and #new: X where X is <= 3,
didn't bother you (or anyone else) until recently.

Because the problem is in proposal Collections-ul.871, not trunk.
 
I proposed all kinds of solutions to minimize the difference between #new
and #new:, and we seemed to agree on one of those.

Cool, let's do one of those, thanks!

 - Chris
 
[1] --         (adding 9 elements to OrderedCollection new: 10)
                100% of baseline rate, 3,860,000 per second. 259 nanoseconds per run. 2.88 % GC time.'

                (adding 9 elements to OrderedCollection new: 3)
                72% of baseline rate, 2,790,000 per second. 359 nanoseconds per run. 3.55929 % GC time.

                (Dictionary new: 1) in trunk
                21,800,000 per second. 45.9 nanoseconds per run. 13.8 % GC time.

                 (Dicitionary new: 1) with Collections-ul.871
                 '16,300,000 per second. 61.4 nanoseconds per run. 5.05899 % GC time.'



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-4
Hi Nicolas,

Thanks for the interesting discussion.  I'd like you to know, I'm not gung-ho about this change, but do think we should seriously consider it for the benefit of Squeak.  I think the benefit is real, but deceptive.

It seems there are two dimensions to the decision:

  - legacy / compatibility
  - API design / user-expectations

I do respect your point about legacy, that writing #new has always meant you get something that can hold up to 10 elements before needing to grow, instead of only 3.
It "sounds" reasonable, but...

Here are some certainties:

    - Allocating a 3 element array is quicker than allocating 10 element one.

Hmm not sure about that. Not for single or few objects.
Allocating many larger short lived objects will increase the rate of scavenging statistically, but this will be measurable only is allocating massively this kind of objects IMO.

    - 3 element Array's take up less memory than 10 element ones
    - consuming more RAM can lead to slowdowns due to paging or GC overhead
    - In the worst possible case (e.g., doing it over and over and nothing else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]


Typical Smalltalk objects are short lived.
That's why we get a generational garbage collection.
So typical usage is unlikely to generate paging.
Case of paging can only occur if many of these objects are longer lived (and tenured), which again is not typical.
For specific usage, there might be specific optimizations like growing a bit the Eden.

Here are some uncertainties:

     - there may be some code somewhere creating many thousands of OrderedCollections (if it were only a few, it wouldn't be noticed)
     - the many thousands are all created in a very short amount of time (if it were spread out over time, it wouldn't be noticed)
     - it then stores 7-9 elements in most of the OrderedCollections
     - in spite of all of the above, the author still wrote #new instead of #new:


All your analysis is exclusively focused on a specific un-typical usage of the library...
You are then trying to bend the general purpose library tothis specific case.
IMO this should be handled with a specific optimization.

My proposal to reduce the the defaultSize was never motivated by performance.  All of the above is simply a _rebuttal_ to Jakob's and Levente's concerns about performance being adversely affected in some code somewhere.  You and I are in agreement about the above -- it shows those concerns are inflated.

No, MY motivation for this proposal has always been for a better API design that can capture _something_, instead of, as you said, "trying to bend the general purpose library" to something we think is "reasonable" with arbitrary numbers like 10 for OrderedCollection.  I contend there is little to no basis for that number, but there CAN be a basis for choosing a small number; space efficiency and API clarity.  You mentioned "typical usage of the library" above, which I think it's impossible to define, but would love to hear your thoughts if there's something more tangibly obtainable than space.


I did propose using something like a SmallDictionary that is WAY more optimized for small size than hashed collection. 

Which I ignored because it missed the point of the proposal.  See above.
 
 
Collections-ul.871, just like the former version, creates dictionaries
matching those expectations, but Dictionaries returned by the new version
use less memory.
So, what's the problem?

It slowed down (Dictionary new: 1) relative to trunk, 
by a comparable margin as adding 9 elements to an (OrderedCollection new: 3) relative to an (OrderedCollection new: 10 
(see [1])

IMO this is typical biased usage of percentages... 
Saving 30% of a short duration or 30% of a long duration is not at all the same thing!
The former case is premature optimization presumably unless used in tight loops.

Right, again this was simply addressing Jakob and Levente's opposition to the proposal.  They had concerns about performance, these benchmarks were meant to allay them for the reason you stated.

As I said, all of my motivations are more outwardly focused; the API presented by Squeak, and user expectations thereof.  I agree with you that these concerns on performance are "premature", and shouldn't stop us from seriously considering this.

Best,
  Chris
 



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Jakob Reschke
Hi Chris,

I still don't see a compelling reason to change #new.

In one paragraph you write about using new: "proactively" and in the next you write this is not about performance.

My opinion on the API: if I start out with no specific knowledge about the expected size of my collection or I don't think that is relevant, I use new. That is 99% of the cases.

I expect that the standard library gives me a default that has proven to work well in most cases. By "work well" I predominantly think of time-efficiency, not space-efficiency.

We can look at other such libraries, OpenJDK also uses 10 for ArrayList: http://hg.openjdk.java.net/jdk/jdk/file/9211f6e20448/src/java.base/share/classes/java/util/ArrayList.java#l118

I only use new: when I know that it gives me a better result, after I noticed that I need a better result. "Proactive" from the start implies premature optimisation for me.

Also I don't expect that initial allocation with new: will always be faster. I expect that I can save reallocations later on. That is where the time is supposed to be saved. If new is faster than new: 1, I am fine with that. Allocating an Array of 1 or not allocating a Collection at all would be even better. I expect a benefit from new: mostly with large numbers.

If I needed new: with a small size because I have space problems, then I wouldn't use hashed collections in the first place. Don't make new possibly worse because it happens to suit your case, which is possibly special.

We have not seen in this thread any credible, real data on what "most cases" and "works well" really is, and I have none to give. There is probably some study or paper about it out there, I don't know. Not having the numbers seems to imply to you that the choice is arbitrary, or even random as you called it, but I beg to differ. If we make it worse in the average but unrevealed case, it is still a regression. A symptom would be if people after the change felt they have to use new: more often to achieve acceptable results. Chances are you won't even be able to measure such cognitive overhead for the users and that's really bad!

Patching new for a specific application is obviously only an option if the application is deployed alone (Sqeuak only as runtime) as opposed to installing the application into other images. So for your GraphQL engine it is not an option, sure. But one can always add another constructor as an extension method if one has special requirements.

Kind regards,
Jakob

Chris Muller <[hidden email]> schrieb am Mi., 29. Jan. 2020, 22:59:
Hi Nicolas,

Thanks for the interesting discussion.  I'd like you to know, I'm not gung-ho about this change, but do think we should seriously consider it for the benefit of Squeak.  I think the benefit is real, but deceptive.

It seems there are two dimensions to the decision:

  - legacy / compatibility
  - API design / user-expectations

I do respect your point about legacy, that writing #new has always meant you get something that can hold up to 10 elements before needing to grow, instead of only 3.
It "sounds" reasonable, but...

Here are some certainties:

    - Allocating a 3 element array is quicker than allocating 10 element one.

Hmm not sure about that. Not for single or few objects.
Allocating many larger short lived objects will increase the rate of scavenging statistically, but this will be measurable only is allocating massively this kind of objects IMO.

    - 3 element Array's take up less memory than 10 element ones
    - consuming more RAM can lead to slowdowns due to paging or GC overhead
    - In the worst possible case (e.g., doing it over and over and nothing else), adding 9 elements to an (OrderedCollection new: 3) is 72% the speed of adding 9 elements to an (OrderedCollection new: 10). see [1]


Typical Smalltalk objects are short lived.
That's why we get a generational garbage collection.
So typical usage is unlikely to generate paging.
Case of paging can only occur if many of these objects are longer lived (and tenured), which again is not typical.
For specific usage, there might be specific optimizations like growing a bit the Eden.

Here are some uncertainties:

     - there may be some code somewhere creating many thousands of OrderedCollections (if it were only a few, it wouldn't be noticed)
     - the many thousands are all created in a very short amount of time (if it were spread out over time, it wouldn't be noticed)
     - it then stores 7-9 elements in most of the OrderedCollections
     - in spite of all of the above, the author still wrote #new instead of #new:


All your analysis is exclusively focused on a specific un-typical usage of the library...
You are then trying to bend the general purpose library tothis specific case.
IMO this should be handled with a specific optimization.

My proposal to reduce the the defaultSize was never motivated by performance.  All of the above is simply a _rebuttal_ to Jakob's and Levente's concerns about performance being adversely affected in some code somewhere.  You and I are in agreement about the above -- it shows those concerns are inflated.

No, MY motivation for this proposal has always been for a better API design that can capture _something_, instead of, as you said, "trying to bend the general purpose library" to something we think is "reasonable" with arbitrary numbers like 10 for OrderedCollection.  I contend there is little to no basis for that number, but there CAN be a basis for choosing a small number; space efficiency and API clarity.  You mentioned "typical usage of the library" above, which I think it's impossible to define, but would love to hear your thoughts if there's something more tangibly obtainable than space.


I did propose using something like a SmallDictionary that is WAY more optimized for small size than hashed collection. 

Which I ignored because it missed the point of the proposal.  See above.
 
 
Collections-ul.871, just like the former version, creates dictionaries
matching those expectations, but Dictionaries returned by the new version
use less memory.
So, what's the problem?

It slowed down (Dictionary new: 1) relative to trunk, 
by a comparable margin as adding 9 elements to an (OrderedCollection new: 3) relative to an (OrderedCollection new: 10 
(see [1])

IMO this is typical biased usage of percentages... 
Saving 30% of a short duration or 30% of a long duration is not at all the same thing!
The former case is premature optimization presumably unless used in tight loops.

Right, again this was simply addressing Jakob and Levente's opposition to the proposal.  They had concerns about performance, these benchmarks were meant to allay them for the reason you stated.

As I said, all of my motivations are more outwardly focused; the API presented by Squeak, and user expectations thereof.  I agree with you that these concerns on performance are "premature", and shouldn't stop us from seriously considering this.

Best,
  Chris
 




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-3
Hi Jakob,

I still don't see a compelling reason to change #new.

Levente's proposal changes it.  That's what sparked this whole conversation...
 

In one paragraph you write about using new: "proactively" and in the next you write this is not about performance.

The "proactively" was only in response to Nicolas and you talking about "premature optimization", which was wrong.  When you say:

I only use new: when I know that it gives me a better result, after I noticed that I need a better result. "Proactive" from the start implies premature optimisation for me.

it's like saying, "Writing Dictionary>>#at: optimally before my application suffers performance problems is 'premature optimization'."  Totally wrong.
 

My opinion on the API: if I start out with no specific knowledge about the expected size of my collection or I don't think that is relevant, I use new.  That is 99% of the cases.

Those cases are irrelevant.  The cases we're discussing are the ones when the size is known at runtime.
 

I expect that the standard library gives me a default that has proven to work well in most cases. By "work well" I predominantly think of time-efficiency, not space-efficiency.

That sounds personal.  One thing that may help is what I suggested to Levente, try writing a manpage-quality comment for #new: explaining all the nuances...  when to use it, when not to, etc...  My hope that exercise would illuminate the issue I have.
 

We can look at other such libraries, OpenJDK also uses 10 for ArrayList: http://hg.openjdk.java.net/jdk/jdk/file/9211f6e20448/src/java.base/share/classes/java/util/ArrayList.java#l118


Java based its core library off Smalltalk, Jakob..
 

Also I don't expect that initial allocation with new: will always be faster. I expect that I can save reallocations later on.

As I said before, not if you know your Dictionary will never grow.   This is not app-specific.
 
That is where the time is supposed to be saved. If new is faster than new: 1, I am fine with that.

This thread is about reducing the defaultSize.  The other thread is about #new:1 needing to be same speed as #new.  They're two separate discussions.  An API design that forces developers to trade one optimization for another is bad design, plain and simple.  Levente's other changes in Collections-ul are good, but to introduce a needless API regression is pointless, and not okay.  We would need one solution or the other.

 - Chris


12