The Inbox: Collections-cmm.874.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Jakob Reschke

Chris Muller <[hidden email]> schrieb am Di., 4. Feb. 2020, 04:39:

I still don't see a compelling reason to change #new.

Levente's proposal changes it.  That's what sparked this whole conversation...

Your commit at the top of this thread changes the default capacity and that is what we are writing about. 

 When you say:

I only use new: when I know that it gives me a better result, after I noticed that I need a better result. "Proactive" from the start implies premature optimisation for me.

it's like saying, "Writing Dictionary>>#at: optimally before my application suffers performance problems is 'premature optimization'."  Totally wrong.

I think the analogy is flawed. There is a space/time tradeoff with the size of the preallocated array. I see no such thing with Dictionary>>at:. Also new allows its user to defer a decision/analysis, at: does not.

 

My opinion on the API: if I start out with no specific knowledge about the expected size of my collection or I don't think that is relevant, I use new.  That is 99% of the cases.

Those cases are irrelevant.  The cases we're discussing are the ones when the size is known at runtime.

No, not at all! They are relevant since we are discussing about new and the default capacity, not new:.

Also you saying 99% of my cases are not worthy of the performance they used to have... does not sound appealing to me.

try writing a manpage-quality comment for #new: explaining all the nuances...  when to use it, when not to, etc...  My hope that exercise would illuminate the issue I have.

"Allocate a new instance of me with the given initial capacity. Use new: instead of new if you want to avoid either unused capacity or repeated growing (and thus, copying) of the collection's memory."

Not quite man page length, but seems adequate to cover my expectations.

 

We can look at other such libraries, OpenJDK also uses 10 for ArrayList: http://hg.openjdk.java.net/jdk/jdk/file/9211f6e20448/src/java.base/share/classes/java/util/ArrayList.java#l118


Java based its core library off Smalltalk, Jakob..

What is that supposed to mean? Do you think the Java designers renamed almost all of the classes and methods, rewrote all of the documentation, but forgot to think about changing the initial capacity of ArrayList ex OrderedCollection?

In fact they did change something about it in Java 8 as I found out after my last post: initially the array is empty (singleton), but it still allocates 10 slots once the first element is added. This optimizes space for empty  collections, but not small ones.

 

Also I don't expect that initial allocation with new: will always be faster. I expect that I can save reallocations later on.

As I said before, not if you know your Dictionary will never grow.   This is not app-specific.

If I use new: to optimize performance I am fairly certain that it won't grow afterwards. My original statement still stands.

 
That is where the time is supposed to be saved. If new is faster than new: 1, I am fine with that.

This thread is about reducing the defaultSize.  The other thread is about #new:1 needing to be same speed as #new.  They're two separate discussions.  An API design that forces developers to trade one optimization for another is bad design, plain and simple.

Then the only sensible solution would be to deprecate and remove new for preallocating collections altogether. Forcing everyone to always think ahead of their collection sizes. For collections that are supposed to grow automatically. Slowing down developent of even non-performance-critical parts. I don't want that either.

Changing the current behavior can make it worse for existing applications (again: we lack the numbers to be sure of the contrary). Keeping the current behavior is bad with regards to your concerns. I say: let's therefore not change it, since user code has already been written and most of it is unlikely to be improved again.


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Chris Muller-4
try writing a manpage-quality comment for #new: explaining all the nuances...  when to use it, when not to, etc...  My hope that exercise would illuminate the issue I have.

"Allocate a new instance of me with the given initial capacity. Use new: instead of new if you want to avoid either unused capacity or repeated growing (and thus, copying) of the collection's memory."

The goal of using #new: is efficiency in general, space OR time (or, both).  So w.r.t. the case of optimizing for time, you forgot to mention the 40% performance hit they would take if the size happened to be 1 or 2...
 
Not quite man page length, but seems adequate to cover my expectations.

Make a spreadsheet with the cartesian product of the following, one per row, and then manually fill out an extra column whether you'd write #new or #new:.

    (knows size | does't know size) 
    * (knows at runtime | knows at compile time) 
    * (will possibly grow | won't possibly grow)
    * ( ... | ...)

When designing a class libary, please think in these general terms and consider all cases, instead of only your own expectations.
 
Also I don't expect that initial allocation with new: will always be faster. I expect that I can save reallocations later on.

As I said before, not if you know your Dictionary will never grow.   This is not app-specific.

If I use new: to optimize performance I am fairly certain that it won't grow afterwards. My original statement still stands.

You said "fairly" certain, now please consider the case where it's certain never to grow (perhaps row 187 on the spreadsheet, above).  I gave you a common, concrete example in an earlier post.  That's the case where your incomplete manpage comment (under Collections-ul.871) could easily trick the developer into hurting her apps performance when she thought she was helping it.  You argued (with flawed logic) that people would have to "think about the sizes all the time" but that's exactly what this would introduce, except in a much more insidious way!  Because it affects negatively depending on the runtime parameter value passed to #new:.
  
That is where the time is supposed to be saved. If new is faster than new: 1, I am fine with that.

You say you're "fine" with that certain performance hit, but you're not fine with this uncertain one whose worst-case possibility was measured to be an equal performance cost.  And, you're even willing to trade away certainty of space-efficiency, too.

This thread is about reducing the defaultSize.  The other thread is about #new:1 needing to be same speed as #new.  They're two separate discussions.  An API design that forces developers to trade one optimization for another is bad design, plain and simple.

Then the only sensible solution would be to deprecate and remove new for preallocating collections altogether. Forcing everyone to always think ahead of their collection sizes.

Sigh.  With statements like the above, I wonder whether you're actually interested in finding a solution or simply arguing...  :(
 
For collections that are supposed to grow automatically. Slowing down developent of even non-performance-critical parts. I don't want that either.

This was your best argument, but it only takes about 10 seconds to understand its logical flaw -- if you're writing performance-critical code, you're going to write #new: anyway.  If you're not, an extra grow from writing #new won't affect performance.

Changing the current behavior can make it worse for existing applications (again: we lack the numbers to be sure of the contrary).

The worst-case scenario was benched and provided for you, and shown to be a similar effect as Collections-ul.871 on #new: 1.
 
Keeping the current behavior is bad with regards to your concerns.

As I said multiple times, the "current behavior" is fine to my concerns, Jakob, it was Levente's proposal which introduced them.  This was just one of two different solutions proposed, but you handily rejected them both, even while offering zero alternatives or even acknowledgement that my concerns are valid.  I'm going to cut my losses from this discussion now.  Your arguments seem like you either aren't understanding mine, or aren't willing to look for consensus.

Levente has committed Collections-ul.872 with only his other changes, e.g., excluding the one which introduced my concern.  He's a gentleman.  It's sad that our inability to resolve this petty squabble kept back his other improvements, which would've included the ability to create minimally small Dictionary's with a minimum internal array size of 3 instead of 5.  :(

 - Chris


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

David T. Lewis
This conversation might well get a prize for lowest signal to noise ratio
in the entire history of the Internet.

It began with a proposed change that would have required toggling two bits
in the source code, which is close to a theoritical minumum for information
content:

  - Current code: '^ self basicNew initialize: 5'
  - Proposed change" '^ self basicNew initialize: 3'

This requires toggling two bits in a single ascii character:
    ($5 asciiValue bitXor: $3 asciiValue) bitCount ==> 2

Changing two bits in the source code of Squeak comes impressively close
to being the smallest thoeretically possible change, and these two
bit-twiddles now form the basis of a vast email thread that has evolved
to include an impressive mix of faulty assumptions, untested metrics,
passionate opinions, and invalid conclusions.

It might seem easy to suggest moving Collections-cmm.874 to the treated
inbox - after all, everyone who reviewed it gave it a solid thumbs
down - but that would spoil the fun. Let's keep this thing going as
long as we can, and in the end we can submit the thread for inclusion
in the Guinness book of world records.

;-)

Dave


On Wed, Feb 05, 2020 at 04:39:09PM -0600, Chris Muller wrote:

> >
> > try writing a manpage-quality comment for #new: explaining all the
> >> nuances...  when to use it, when not to, etc...  My hope that exercise
> >> would illuminate the issue I have.
> >>
> >
> > "Allocate a new instance of me with the given initial capacity. Use new:
> > instead of new if you want to avoid either unused capacity or repeated
> > growing (and thus, copying) of the collection's memory."
> >
>
> The goal of using #new: is efficiency in general, space OR time (or,
> both).  So w.r.t. the case of optimizing for time, you forgot to mention
> the 40% performance hit they would take if the size happened to be 1 or 2...
>
>
> > Not quite man page length, but seems adequate to cover my expectations.
> >
>
> Make a spreadsheet with the cartesian product of the following, one per
> row, and then manually fill out an extra column whether you'd write #new or
> #new:.
>
>     (knows size | does't know size)
>     * (knows at runtime | knows at compile time)
>     * (will possibly grow | won't possibly grow)
>     * ( ... | ...)
>
> When designing a class libary, please think in these general terms and
> consider *all cases*, instead of only your own expectations.
>
>
> > Also I don't expect that initial allocation with new: will always be
> >>> faster. I expect that I can save reallocations later on.
> >>>
> >>
> >> As I said before, not if you know your Dictionary will never grow.   This
> >> is not app-specific.
> >>
> >
> > If I use new: to optimize performance I am fairly certain that it won't
> > grow afterwards. My original statement still stands.
> >
>
> You said "fairly" certain, now please consider the case where it's
> *certain* never
> to grow (perhaps row 187 on the spreadsheet, above).  I gave you a common,
> concrete example in an earlier post.  That's the case where your incomplete
> manpage comment (under Collections-ul.871) could easily trick the developer
> into hurting her apps performance when she thought she was helping it.  You
> argued (with flawed logic) that people would have to "think about the sizes
> all the time" but that's exactly what this would introduce, except in a
> much more insidious way!  Because it affects negatively depending on
> the *runtime
> parameter* value passed to #new:.
>
>
> > That is where the time is supposed to be saved. If new is faster than new:
> >>> 1, I am fine with that.
> >>>
> >>
> You say you're "fine" with that *certain* performance hit, but you're not
> fine with this uncertain one whose worst-case possibility was measured to
> be an equal performance cost.  *And*, you're even willing to trade away
> certainty of space-efficiency, too.
>
> This thread is about reducing the defaultSize.  The other thread is about
> >> #new:1 needing to be same speed as #new.  They're two separate
> >> discussions.  An API design that forces developers to trade one
> >> optimization for another is bad design, plain and simple.
> >>
> >
> > Then the only sensible solution would be to deprecate and remove new for
> > preallocating collections altogether. Forcing everyone to always think
> > ahead of their collection sizes.
> >
>
> Sigh.  With statements like the above, I wonder whether you're actually
> interested in finding a solution or simply arguing...  :(
>
>
> > For collections that are supposed to grow automatically. Slowing down
> > developent of even non-performance-critical parts. I don't want that either.
> >
>
> This was your best argument, but it only takes about 10 seconds to
> understand its logical flaw -- if you're writing performance-critical code,
> you're going to write #new: anyway.  If you're not, an extra grow from
> writing #new won't affect performance.
>
> Changing the current behavior can make it worse for existing applications
> > (again: we lack the numbers to be sure of the contrary).
> >
>
> The worst-case scenario was benched and provided for you, and shown to be a
> similar effect as Collections-ul.871 on #new: 1.
>
>
> > Keeping the current behavior is bad with regards to your concerns.
> >
>
> As I said multiple times, the "current behavior" is *fine* to my concerns,
> Jakob, it was Levente's proposal which introduced them.  This was just one
> of two different solutions proposed, but you handily rejected them both,
> even while offering zero alternatives or even acknowledgement that my
> concerns are valid.  I'm going to cut my losses from this discussion now.
> Your arguments seem like you either aren't understanding mine, or aren't
> willing to look for consensus.
>
> Levente has committed Collections-ul.872 with only his other changes, e.g.,
> excluding the one which introduced my concern.  He's a gentleman.  It's sad
> that our inability to resolve this petty squabble kept back his other
> improvements, which would've included the ability to create minimally small
> Dictionary's with a minimum internal array size of 3 instead of 5.  :(
>
>  - Chris

>


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-cmm.874.mcz

Stéphane Rollandin
> This conversation might well get a prize for lowest signal to noise ratio
> in the entire history of the Internet.

Indeed, but some sort of gentle bliss is to be found in the
contemplation of an epic fight over such a small issue - at least this
time the world as we know it is not at stake, and we can cosily relax
while listening to the mildly terrifying wind sounds coming from the
tempest in a teapot.

> It might seem easy to suggest moving Collections-cmm.874 to the treated
> inbox - after all, everyone who reviewed it gave it a solid thumbs
> down - but that would spoil the fun.

Yes, keep it rolling - it may become its own art form.

Stef

12