SequenceableCollection, comparing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

SequenceableCollection, comparing

Stefan Schmiedl
While checking out Mocketry, I came upon SequenceableCollection>>=
and SequenceableCollection>>isSameSequenceAs. The latter seems to be
part of the former. Is there a technical reason for this code
duplication?

Thanks,
s.

Reply | Threaded
Open this post in threaded view
|

Re: SequenceableCollection, comparing

Eliot Miranda-2
3 reasons.  isSameSequenceAs: was written after #= (by me IIRC).  Performance of #= can be important and the change would slow down comparing empty collections.  If it ain't broke don't fix it. Hence my call was no to change #= to use isSameSequenceAs:.

On Feb 19, 2008 11:46 PM, Stefan Schmiedl <[hidden email]> wrote:
While checking out Mocketry, I came upon SequenceableCollection>>=
and SequenceableCollection>>isSameSequenceAs. The latter seems to be
part of the former. Is there a technical reason for this code
duplication?

Thanks,
s.


Reply | Threaded
Open this post in threaded view
|

Re: SequenceableCollection, comparing

Stefan Schmiedl
Hi Eliot,

thanks for replying. I have done some more messing with the code,
please bear with me, I'm more testing my understanding of VisualWorks
than challenging your decisions.

On Wed, 20 Feb 2008 11:11:54 -0800
"Eliot Miranda" <[hidden email]> wrote:

> 3 reasons.  isSameSequenceAs: was written after #= (by me IIRC).

I guessed so. It looks like it is an optimization for when you are
not concerned comparing baskets of apples to bags of oranges.
Empty containers are always sad :-)

> Performance of #= can be important and the change would slow down comparing
> empty collections.  

        (Time millisecondsToRun: [ |x c1 c2|
                x := true.
                c1 := OrderedCollection new.
                c2 := c1 copy.
                10000000 timesRepeat: [x := x & (c1 ### c2)].
                x out
        ]) out

gives the following results on successive runs when
subsituting ### with

        =                    742, 712, 724
        isSameSequenceAs:    407, 416, 408
        myEquals             777, 789, 787

where myEquals is the "obvious" refactoring:

        myEquals: otherCollection
       
                self species == otherCollection species ifFalse: [^false].
                ^self isSameSequenceAs: otherCollection

10 million method calls make about a 20th of a second difference
on my 2GHz amd64.  So I guess that comparing the species is way
more expensive than another method call. Which justifies the
existence of the separate method, but not the lack of refactoring.
I'm just a grasshopper, what do the masters think?

Is the code above suitable for measuring what I claim? I tried to
make it so that it's not trivially possible to optimize away what
I'm interested in.

Next, I changed c2 to be an empty array:

        (Time millisecondsToRun: [ |x c1 c2|
                x := true.
                c1 := OrderedCollection new.
                c2 := Array new.
                10000000 timesRepeat: [x := x & (c1 ### c2)].
                x out
        ]) out

        =                    597, 582, 612
        isSameSequenceAs:    371, 412, 390
        myEquals             581, 567, 576

isSameSequence is in the same ballbark as with two OrderedCollections,
both = and myEquals are significantly faster. As it should be, since
they leave early returning false.

> If it ain't broke don't fix it. Hence my call was no to
> change #= to use isSameSequenceAs:.

yeah... that one gets me regularly in trouble :-)

Regards,
s.

Reply | Threaded
Open this post in threaded view
|

Re: SequenceableCollection, comparing

Eliot Miranda-2
Hi Stefan,

On Wed, Feb 20, 2008 at 11:58 AM, Stefan Schmiedl <[hidden email]> wrote:
Hi Eliot,

thanks for replying. I have done some more messing with the code,
please bear with me, I'm more testing my understanding of VisualWorks
than challenging your decisions.

On Wed, 20 Feb 2008 11:11:54 -0800
"Eliot Miranda" <[hidden email]> wrote:

> 3 reasons.  isSameSequenceAs: was written after #= (by me IIRC).

I guessed so. It looks like it is an optimization for when you are
not concerned comparing baskets of apples to bags of oranges.
Empty containers are always sad :-)

> Performance of #= can be important and the change would slow down comparing
> empty collections.

       (Time millisecondsToRun: [ |x c1 c2|
               x := true.
               c1 := OrderedCollection new.
               c2 := c1 copy.
               10000000 timesRepeat: [x := x & (c1 ### c2)].
               x out
       ]) out

gives the following results on successive runs when
subsituting ### with

       =                    742, 712, 724
       isSameSequenceAs:    407, 416, 408
       myEquals             777, 789, 787


where myEquals is the "obvious" refactoring:

       myEquals: otherCollection

               self species == otherCollection species ifFalse: [^false].
               ^self isSameSequenceAs: otherCollection

10 million method calls make about a 20th of a second difference
on my 2GHz amd64.  So I guess that comparing the species is way
more expensive than another method call. Which justifies the
existence of the separate method, but not the lack of refactoring.
I'm just a grasshopper, what do the masters think?

Looks like a  (742 + 712 + 724) - (777 + 789 + 787) / (742 + 712 + 724) asFloat, or 8% slowdown.  Quite significant.
 
Is the code above suitable for measuring what I claim? I tried to
make it so that it's not trivially possible to optimize away what
I'm interested in.

Next, I changed c2 to be an empty array:

       (Time millisecondsToRun: [ |x c1 c2|
               x := true.
               c1 := OrderedCollection new.
               c2 := Array new.
               10000000 timesRepeat: [x := x & (c1 ### c2)].
               x out
       ]) out

       =                    597, 582, 612
       isSameSequenceAs:    371, 412, 390
       myEquals             581, 567, 576

isSameSequence is in the same ballbark as with two OrderedCollections,
both = and myEquals are significantly faster. As it should be, since
they leave early returning false.

??  your figures show isSameSequenceAs: is faster, around 400ms as opposed to ~600ms.

P.S. try something like the following for more stable measurements:

BlockClosure methods for benchmarking
benchmarkMicrosecondsToRunRepetitions: n
    | start end |
    Smalltalk garbageCollect. "side-effect is to flush the compiled code cache"
    start := Time microsecondClock. "invoke once to compile the code"
    self value. "run once to compile the code"
    start := Time microsecodClock.
    1 to: n do: [:i| self value]. "compiler generates inline code for this loop."
    end := Time microsecondClock.
    ^(end - start) / n

> If it ain't broke don't fix it. Hence my call was no to
> change #= to use isSameSequenceAs:.

yeah... that one gets me regularly in trouble :-)

:)  The three reasons together still do it for me.

Regards,
s.

Best
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: SequenceableCollection, comparing

Stefan Schmiedl
On Wed, 20 Feb 2008 18:31:27 -0800
"Eliot Miranda" <[hidden email]> wrote:

> >
> >        =                    597, 582, 612
> >        isSameSequenceAs:    371, 412, 390
> >        myEquals             581, 567, 576
> >
> > isSameSequence is in the same ballbark as with two OrderedCollections,
> > both = and myEquals are significantly faster. As it should be, since
> > they leave early returning false.
>
>
> ??  your figures show isSameSequenceAs: is faster, around 400ms as opposed
> to ~600ms.

"both = and myEquals are significantly faster" with OC and Array than
with OC and OC. Sorry for the confusion.

> P.S. try something like the following for more stable measurements:
>

Thanks, that one goes straight into my toolbox.

> :)  The three reasons together still do it for me.

As I said, I do this for the learning experience. E.g. I learned that
there is a method isSameSequenceAs: already doing size comparisons,
which directly translates into shorter code in my packages.


Thanks,
s.