The defaullt implementation of isEmpty might do too much work

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

The defaullt implementation of isEmpty might do too much work

Nicolas Cellier
The default behavior is to rely on ^self size = 0.
But default size is doing too much (it iterates over all elements).
It's OK for most of our classes which know very well about their size and override #size.
It's not OK for an eventually lazy or infinite subclass.
I realized this by answering http://stackoverflow.com/questions/40149826/is-there-a-construct-like-iterate-iterable-if-it-has-elements-else


Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Eliot Miranda-2
Nicolas,

On Thu, Oct 20, 2016 at 9:40 AM, Nicolas Cellier <[hidden email]> wrote:
The default behavior is to rely on ^self size = 0.
But default size is doing too much (it iterates over all elements).
It's OK for most of our classes which know very well about their size and override #size.
It's not OK for an eventually lazy or infinite subclass.

Good catch.  Change it!

isEmpty
    self do: [:element| ^false].
    ^true

 

_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Stéphane Rollandin
> isEmpty
>     self do: [:element| ^false].
>     ^true

Nice one.

Stef


Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

monty-3
In reply to this post by Nicolas Cellier
All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.

Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.

Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Eliot Miranda-2
Hi Monty,

    what happens if you add an isEmpty implement ration based in size to SequenceableCollection?

_,,,^..^,,,_ (phone)

> On Oct 21, 2016, at 3:59 PM, monty <[hidden email]> wrote:
>
> All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
>
> Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
>

Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

monty-3
Putting #size-based implementations in abstract Collection subclasses negates the lazy collection benefit and still penalizes non-abstract direct Collection subclasses like CharacterSet and any user-defined direct Collection subclass implemented using composition with forwarding to a concrete collection. If the default linear complexity of #size and #isEmpty bothers you so much, make #size a subclassResponsiblity like #do: instead of suddenly penalizing collections whose authors actually did the right thing by providing a non-linear #size implementation.

> Sent: Friday, October 21, 2016 at 11:26 PM
> From: "Eliot Miranda" <[hidden email]>
> To: "The general-purpose Squeak developers list" <[hidden email]>
> Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
>
> Hi Monty,
>
>     what happens if you add an isEmpty implement ration based in size to SequenceableCollection?
>
> _,,,^..^,,,_ (phone)
>
> > On Oct 21, 2016, at 3:59 PM, monty <[hidden email]> wrote:
> >
> > All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
> >
> > Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Nicolas Cellier
Please retry with latest trunk version.

2016-10-22 6:48 GMT+02:00 monty <[hidden email]>:
Putting #size-based implementations in abstract Collection subclasses negates the lazy collection benefit and still penalizes non-abstract direct Collection subclasses like CharacterSet and any user-defined direct Collection subclass implemented using composition with forwarding to a concrete collection. If the default linear complexity of #size and #isEmpty bothers you so much, make #size a subclassResponsiblity like #do: instead of suddenly penalizing collections whose authors actually did the right thing by providing a non-linear #size implementation.

> Sent: Friday, October 21, 2016 at 11:26 PM
> From: "Eliot Miranda" <[hidden email]>
> To: "The general-purpose Squeak developers list" <[hidden email]>
> Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
>
> Hi Monty,
>
>     what happens if you add an isEmpty implement ration based in size to SequenceableCollection?
>
> _,,,^..^,,,_ (phone)
>
> > On Oct 21, 2016, at 3:59 PM, monty <[hidden email]> wrote:
> >
> > All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
> >
> > Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
> >
>
>




Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

monty-3
The latest trunk has numerous custom #isEmpty implementations, but that doesn't help custom collections outside the image not inheriting one. I maintain cross-platform projects that have such collections, like XPath with XPathFunctionSet or the BitmapCharacterSet package it relies on. Both are Collection subclasses, and now I feel like I should add this:
isEmpty
        ^ self size = 0

to every single collection I maintain. Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?

>Sent: Saturday, October 22, 2016 at 8:42 AM
>From: "Nicolas Cellier" <[hidden email]>
>To: "The general-purpose Squeak developers list" <[hidden email]>
>Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
>
>Please retry with latest trunk version.
>
>2016-10-22 6:48 GMT+02:00 monty <[hidden email]>:Putting #size-based implementations in abstract Collection subclasses negates the lazy collection benefit and still penalizes non-abstract direct Collection subclasses like CharacterSet and any user-defined direct Collection subclass implemented using composition with forwarding to a concrete collection. If the default linear complexity of #size and #isEmpty bothers you so much, make #size a subclassResponsiblity like #do: instead of suddenly penalizing collections whose authors actually did the right thing by providing a non-linear #size implementation.
>
>> Sent: Friday, October 21, 2016 at 11:26 PM
>> From: "Eliot Miranda" <[hidden email]>
>> To: "The general-purpose Squeak developers list" <[hidden email][mailto:[hidden email]]>
>> Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
>
>>
>> Hi Monty,
>>
>>     what happens if you add an isEmpty implement ration based in size to SequenceableCollection?
>>
>> _,,,^..^,,,_ (phone)
>>
>> > On Oct 21, 2016, at 3:59 PM, monty <[hidden email]> wrote:
>> >
>> > All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
>> >
>> > Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
>> >
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Bert Freudenberg
On Monday, 24 October 2016, monty <[hidden email]> wrote:
Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?

Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.

So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.

- Bert - 


Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Eliot Miranda-2


On Mon, Oct 24, 2016 at 1:31 AM, Bert Freudenberg <[hidden email]> wrote:
On Monday, 24 October 2016, monty <[hidden email]> wrote:
Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?

Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.

+1
 
So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.

- Bert - 






--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Chris Muller-3
In reply to this post by Bert Freudenberg
Normally its the Subclasses where implementation-specific
optimizations are done, not in the top abstractions.  At the level of
Collection, the code should be elegant, expressive, and defining of
the responsibilities by the messages it sends.  #size is something of
Collection, period, there is nothing wrong with sending it, and
checking size = 0 is more elegant than a block activation with
non-local return.

As a community that respects its user base, we need to consider the
*type* of damage that "Change it!" could cause to existing production
applications.  These applications were built on the expressive
implementation, but now they'll be sending #do: to that SQLCollection
instead of #size, which will cause the database to retrieve a full
ResultSet, send it back to the client, just so it can answer #isEmpty.

Its this kind of "silent" damage that is the worst.  Because its
invisible, not a an error, not a blow up.  Just an degradation in
performance (ironic, given our motive here) which, believably, would
not be caught in testing.

I agree with Monty.  Collection is abstract, there will never be an
instance of it.  No self-respecting subclass needs this optimized, and
trying to optimize it way up there forces too many assumptions about
implementation.  It makes the code less expressive while silently
inflicting potentially performance-killing pain onto legacy
applications until they're forced to sprinkle duplicate #isEmpty
implementations all about..

On Mon, Oct 24, 2016 at 3:31 AM, Bert Freudenberg <[hidden email]> wrote:

> On Monday, 24 October 2016, monty <[hidden email]> wrote:
>>
>> Why not just make #size a subclassResponsibility or add abstract
>> superclasses for lazy or infinite collections that implement #isEmpty using
>> #do: and change #size to shouldNotImplement?
>
>
> Because #do: is the only method required to be overridden by Collection
> subclasses. All other methods are implemented in terms of #do:.
>
> So yes, if your Collection subclass has an optimized implementation for
> #isEmpty, then provide it. That is a small price to pay for making a class
> work optimally across different code bases.
>
> - Bert -
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Chris Muller-3
Since Smalltalk-80, sends to #do: have been considered expensive, and
sends to #size, inexpensive.  And so it has been okay for applications
which defined their own Collection subclasses, to add a few
milliseconds of cost to something that's already considered expensive
(#do:), and not to something which is expected to be fast (#size).

** For 30 years applications were built on these assumptions.  Even I
did it in Magma.  #isEmpty was built on that assumption.  Subclass
implementors have seemed to agree to these assumptions too, and made
their implementations true to them..

May we please reconsider this change?


On Mon, Oct 24, 2016 at 9:07 PM, Chris Muller <[hidden email]> wrote:

> Normally its the Subclasses where implementation-specific
> optimizations are done, not in the top abstractions.  At the level of
> Collection, the code should be elegant, expressive, and defining of
> the responsibilities by the messages it sends.  #size is something of
> Collection, period, there is nothing wrong with sending it, and
> checking size = 0 is more elegant than a block activation with
> non-local return.
>
> As a community that respects its user base, we need to consider the
> *type* of damage that "Change it!" could cause to existing production
> applications.  These applications were built on the expressive
> implementation, but now they'll be sending #do: to that SQLCollection
> instead of #size, which will cause the database to retrieve a full
> ResultSet, send it back to the client, just so it can answer #isEmpty.
>
> Its this kind of "silent" damage that is the worst.  Because its
> invisible, not a an error, not a blow up.  Just an degradation in
> performance (ironic, given our motive here) which, believably, would
> not be caught in testing.
>
> I agree with Monty.  Collection is abstract, there will never be an
> instance of it.  No self-respecting subclass needs this optimized, and
> trying to optimize it way up there forces too many assumptions about
> implementation.  It makes the code less expressive while silently
> inflicting potentially performance-killing pain onto legacy
> applications until they're forced to sprinkle duplicate #isEmpty
> implementations all about..
>
> On Mon, Oct 24, 2016 at 3:31 AM, Bert Freudenberg <[hidden email]> wrote:
>> On Monday, 24 October 2016, monty <[hidden email]> wrote:
>>>
>>> Why not just make #size a subclassResponsibility or add abstract
>>> superclasses for lazy or infinite collections that implement #isEmpty using
>>> #do: and change #size to shouldNotImplement?
>>
>>
>> Because #do: is the only method required to be overridden by Collection
>> subclasses. All other methods are implemented in terms of #do:.
>>
>> So yes, if your Collection subclass has an optimized implementation for
>> #isEmpty, then provide it. That is a small price to pay for making a class
>> work optimally across different code bases.
>>
>> - Bert -
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

monty-3
In reply to this post by Bert Freudenberg
Except #do: isn't the only method required to be overridden. #add: and #remove:ifAbsent: are also abstract.

> Sent: Monday, October 24, 2016 at 4:31 AM
> From: "Bert Freudenberg" <[hidden email]>
> To: "The general-purpose Squeak developers list" <[hidden email]>
> Subject: Re: [squeak-dev] The defaullt implementation of isEmpty might do too much work
> On Monday, 24 October 2016, monty <[hidden email][mailto:[hidden email]]> wrote:Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?
>  
> Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.
>  
> So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.
>  
> - Bert -

Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Bert Freudenberg
On Tue, Oct 25, 2016 at 6:33 AM, monty <[hidden email]> wrote:
Except #do: isn't the only method required to be overridden. #add: and #remove:ifAbsent: are also abstract.

Only for extensible subclasses. These *can not* be implemented in terms of #do: so they have to be implemented in a subclass.

Collection provides a maximum of protocol with a minimum of methods required to be overridden.

This discussion is besides the point though. I was just pointing out that requiring #size or #isEmpty to be a subclass responsibility is not a good idea, because they can and should be implemented in terms of #do:. Whether #isEmpty should be implemented in terms of #size is a wholly different issue. 

Chris has some valid points about historical performance trade-offs which are worth discussing.

My take is that if we don't know *anything* about the subclass, we have to assume that #size can be way more expensive than a single iteration of #do: (e.g. a linked list). So using #do: is the sensible default implementation for #isEmpty. If OTOH a subclass has a very expensive #do:, then it's reasonable to assume that it would avoid doing much work if it is in fact empty.

If a Collection subclass did rely on an implementation detail of its superclass performance-wise then this is unfortunate, but easy to fix by implementing #isEmpty in terms of #size. We shouldn't let that get in the way of making our base libraries more elegant.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Chris Muller-3
> My take is that if we don't know *anything* about the subclass, we have to
> assume that #size can be way more expensive than a single iteration of #do:
> (e.g. a linked list).

I don't understand this.  How could #size ever be "way more expensive"
than a single iteration of #do:?  Based on the implementation in
Collection, that's impossible..

> So using #do: is the sensible default implementation
> for #isEmpty. If OTOH a subclass has a very expensive #do:, then it's
> reasonable to assume that it would avoid doing much work if it is in fact
> empty.
>
> If a Collection subclass did rely on an implementation detail of its
> superclass performance-wise then this is unfortunate,

Well that's precisely what this change does!  We're trying to fix some
hypothetical subclass to "rely" on this implementation detail way up
in Collection..

> but easy to fix by
> implementing #isEmpty in terms of #size.

You ignored the "silent damage" argument, that someone wouldn't even
KNOW it needs fixing.

Besides, the fix should be made in the subclass which doesn't have a fast #size.

Folks there is NO BASIS at the level of Collection for assuming that
do: [ : each | ^
false ] is faster than ^self size=0.  In fact, the proper assumption
is the opposite -- that #size is the optimized method, not #do:.

> We shouldn't let that get in the
> way of making our base libraries more elegant.

It makes them LESS elegant, because of 1) the violation of the
assumption that #size is faster than #do:, plus 2) the associated
duplication of code that will be required across multiple subclasses
just to recover that original faster implementation, which they used
to inherit.  Doesn't that matter?

I don't think the burden of proof should be on the legacy method, but
on the new implementation which purports some improvement.  This
change sounded good in the ivory towser, but can anyone identify *one
single place* in the real-world where this change is beneficial?

Because the detractors are potentially very real...

Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Tobias Pape

On 25.10.2016, at 21:45, Chris Muller <[hidden email]> wrote:

> Folks there is NO BASIS at the level of Collection for assuming that
> do: [ : each | ^
> false ] is faster than ^self size=0.  In fact, the proper assumption
> is the opposite -- that #size is the optimized method, not #do:.

I don't understand on what _either_ performance-assumption is grounded.
Since I've apparently not been around since the early years of either
ST-80 or Squeak, can someone enlighten me here?

Both ways seem somewhat reasonable:
 `self size = 0` is simple, easy to understand on first sight and said here to be more expensive than
 `self do: [:ea |  ^false]`, which is clever[1], possibly needs a double-take (or comment), but has a certain appeal of wholeness[2].

Best regards
        -Tobias








[1]: sadly, cleverness in software typically is a drawback.
[2]: I mean, the minimalism that is possible by just using/providing #do: somehow conveys closure/completeness
Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

timrowledge

> On 25-10-2016, at 1:03 PM, Tobias Pape <[hidden email]> wrote:
> [1]: sadly, cleverness in software typically is a drawback.

Important point here; since debugging typically requires more cleverness than writing, making the code as clever as possible almost guarantees it cannot be debugged.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- An 8086 in a StrongARM environment.



Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Eliot Miranda-2
In reply to this post by Tobias Pape
Hi Tobias, Hi All,

On Tue, Oct 25, 2016 at 1:03 PM, Tobias Pape <[hidden email]> wrote:

On 25.10.2016, at 21:45, Chris Muller <[hidden email]> wrote:

> Folks there is NO BASIS at the level of Collection for assuming that
> do: [ : each | ^
> false ] is faster than ^self size=0.  In fact, the proper assumption
> is the opposite -- that #size is the optimized method, not #do:.

I don't understand on what _either_ performance-assumption is grounded.
Since I've apparently not been around since the early years of either
ST-80 or Squeak, can someone enlighten me here?

Both ways seem somewhat reasonable:
 `self size = 0` is simple, easy to understand on first sight and said here to be more expensive than 
 `self do: [:ea |  ^false]`, which is clever[1], possibly needs a double-take (or comment), but has a certain appeal of wholeness[2].

self size = 0 doesn't work for infinite collections.  The suggested implementation of isEmpty does indeed require a comment (I didn't implement it) but it does work for infinite collections. The suggested implementation os isEmpty is also faster than elf size = 0 for any collection of sufficient size.

For example:

| b n |
b := Bag someInstance.
n := 100000.
{ b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do: [:ign| b isEmpty]] timeToRun } #(20402 3293 21)

and the cut off is quite low:


| n |
n := 10000.
((1 to: 10) collect:
[:iguana|
(1 to: SmallInteger maxVal) detect:
[:i|
b := (1 to: i) asBag.
[1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b isEmpty]] timeToRun]]) average asFloat 4.9

So for collections that are 5 elements or more the "clever" implementation is faster.

Best regards
        -Tobias

[1]: sadly, cleverness in software typically is a drawback.
[2]: I mean, the minimalism that is possible by just using/providing #do: somehow conveys closure/completeness


_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Nicolas Cellier
Hi, Chris, Monty, Tim,
did you see the implementation of Collection>>size?

size
    "Answer how many elements the receiver contains."

    | tally |
    tally := 0.
    self do: [:each | tally := tally + 1].
    ^ tally

Let's forget a minute about specific subclasses.
Does iterating over the whole collection is really the best idea for implementing isEmpty?
When we implement a message in Collection, we should think in term of Collection, not Interval, Set nor Array.

We're talking of several things here:
- universality of the default implementation:
  we don't need to know about the size to tell if empty
   not all collection have a known size
- optimization:
  old implementation is fast for collections having an optimized size, catastrophic for others.
  new implementation is reasonably fast, and full optimization can be restored with a few 1-liner.
- cleverness:
  IMO this is not more clever than, and quite logically connected to implementation of size

I agree that having several implementations for the sake of optimization is not ideal.
That's what would have curbed my enthusiasm.
Though I doubt about any noticeable slowdown in anything else than micro-benchmarks.



2016-10-25 22:49 GMT+02:00 Eliot Miranda <[hidden email]>:
Hi Tobias, Hi All,

On Tue, Oct 25, 2016 at 1:03 PM, Tobias Pape <[hidden email]> wrote:

On 25.10.2016, at 21:45, Chris Muller <[hidden email]> wrote:

> Folks there is NO BASIS at the level of Collection for assuming that
> do: [ : each | ^
> false ] is faster than ^self size=0.  In fact, the proper assumption
> is the opposite -- that #size is the optimized method, not #do:.

I don't understand on what _either_ performance-assumption is grounded.
Since I've apparently not been around since the early years of either
ST-80 or Squeak, can someone enlighten me here?

Both ways seem somewhat reasonable:
 `self size = 0` is simple, easy to understand on first sight and said here to be more expensive than 
 `self do: [:ea |  ^false]`, which is clever[1], possibly needs a double-take (or comment), but has a certain appeal of wholeness[2].

self size = 0 doesn't work for infinite collections.  The suggested implementation of isEmpty does indeed require a comment (I didn't implement it) but it does work for infinite collections. The suggested implementation os isEmpty is also faster than elf size = 0 for any collection of sufficient size.

For example:

| b n |
b := Bag someInstance.
n := 100000.
{ b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do: [:ign| b isEmpty]] timeToRun } #(20402 3293 21)

and the cut off is quite low:


| n |
n := 10000.
((1 to: 10) collect:
[:iguana|
(1 to: SmallInteger maxVal) detect:
[:i|
b := (1 to: i) asBag.
[1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b isEmpty]] timeToRun]]) average asFloat 4.9

So for collections that are 5 elements or more the "clever" implementation is faster.

Best regards
        -Tobias

[1]: sadly, cleverness in software typically is a drawback.
[2]: I mean, the minimalism that is possible by just using/providing #do: somehow conveys closure/completeness


_,,,^..^,,,_
best, Eliot






Reply | Threaded
Open this post in threaded view
|

Re: The defaullt implementation of isEmpty might do too much work

Chris Muller-4
In reply to this post by Eliot Miranda-2
Hi Tobias and Eliot,

>> > Folks there is NO BASIS at the level of Collection for assuming that
>> > do: [ : each | ^
>> > false ] is faster than ^self size=0.  In fact, the proper assumption
>> > is the opposite -- that #size is the optimized method, not #do:.
>>
>> I don't understand on what _either_ performance-assumption is grounded.

Exactly.  The original basis for making this change is groundless.

 > self size = 0 doesn't work for infinite collections.

What's an infinite collection, and why wouldn't self size = 0 work for
its #isEmpty?  I assume it will need to override #size, so if it
answers Float infinity then #isEmpty is false.  What's not working?
And, irregardless, custom collections like that define what THEY need
to work properly (e.g., #size, #isEmpty, whatever) in their own class,
not going changing 30-year old base library code to make themselves
work..

>   The suggested implementation of isEmpty does indeed require a comment (I didn't implement it) but it does work for infinite collections. The suggested implementation os isEmpty is also faster than elf size = 0 for any collection of sufficient size.
>
> For example:
>
> | b n |
> b := Bag someInstance.
> n := 100000.
> { b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do: [:ign| b isEmpty]] timeToRun } #(20402 3293 21)
>
> and the cut off is quite low:
>
>
> | n |
> n := 10000.
> ((1 to: 10) collect:
> [:iguana|
> (1 to: SmallInteger maxVal) detect:
> [:i|
> b := (1 to: i) asBag.
> [1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b isEmpty]] timeToRun]]) average asFloat 4.9
>
> So for collections that are 5 elements or more the "clever" implementation is faster.



Then add Bag>>#isEmpty, don't change Collection>>#isEmpty, because it
changes a 30-year old assumption about how #isEmpty works for every
other kind of Collection.

12