Smalltalk › Cincom › VisualWorks

SequenceableCollection, comparing

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Stefan Schmiedl

SequenceableCollection, comparing

While checking out Mocketry, I came upon SequenceableCollection>>=
and SequenceableCollection>>isSameSequenceAs. The latter seems to be
part of the former. Is there a technical reason for this code
duplication?

Thanks,
s.

Eliot Miranda-2

Re: SequenceableCollection, comparing

3 reasons. isSameSequenceAs: was written after #= (by me IIRC). Performance of #= can be important and the change would slow down comparing empty collections. If it ain't broke don't fix it. Hence my call was no to change #= to use isSameSequenceAs:.

On Feb 19, 2008 11:46 PM, Stefan Schmiedl <[hidden email]> wrote:

While checking out Mocketry, I came upon SequenceableCollection>>=
and SequenceableCollection>>isSameSequenceAs. The latter seems to be
part of the former. Is there a technical reason for this code
duplication?

Thanks,
s.

Stefan Schmiedl

Re: SequenceableCollection, comparing

Hi Eliot,

thanks for replying. I have done some more messing with the code,
please bear with me, I'm more testing my understanding of VisualWorks
than challenging your decisions.

On Wed, 20 Feb 2008 11:11:54 -0800
"Eliot Miranda" <[hidden email]> wrote:

> 3 reasons. isSameSequenceAs: was written after #= (by me IIRC).

I guessed so. It looks like it is an optimization for when you are
not concerned comparing baskets of apples to bags of oranges.
Empty containers are always sad :-)

> Performance of #= can be important and the change would slow down comparing
> empty collections.

(Time millisecondsToRun: [ |x c1 c2|
x := true.
c1 := OrderedCollection new.
c2 := c1 copy.
10000000 timesRepeat: [x := x & (c1 ### c2)].
x out
]) out

gives the following results on successive runs when
subsituting ### with

= 742, 712, 724
isSameSequenceAs: 407, 416, 408
myEquals 777, 789, 787

where myEquals is the "obvious" refactoring:

myEquals: otherCollection

self species == otherCollection species ifFalse: [^false].
^self isSameSequenceAs: otherCollection

10 million method calls make about a 20th of a second difference
on my 2GHz amd64. So I guess that comparing the species is way
more expensive than another method call. Which justifies the
existence of the separate method, but not the lack of refactoring.
I'm just a grasshopper, what do the masters think?

Is the code above suitable for measuring what I claim? I tried to
make it so that it's not trivially possible to optimize away what
I'm interested in.

Next, I changed c2 to be an empty array:

(Time millisecondsToRun: [ |x c1 c2|
x := true.
c1 := OrderedCollection new.
c2 := Array new.
10000000 timesRepeat: [x := x & (c1 ### c2)].
x out
]) out

= 597, 582, 612
isSameSequenceAs: 371, 412, 390
myEquals 581, 567, 576

isSameSequence is in the same ballbark as with two OrderedCollections,
both = and myEquals are significantly faster. As it should be, since
they leave early returning false.

> If it ain't broke don't fix it. Hence my call was no to
> change #= to use isSameSequenceAs:.

yeah... that one gets me regularly in trouble :-)

Regards,
s.

Eliot Miranda-2

Re: SequenceableCollection, comparing

Hi Stefan,

On Wed, Feb 20, 2008 at 11:58 AM, Stefan Schmiedl <[hidden email]> wrote:

Hi Eliot,

thanks for replying. I have done some more messing with the code,
please bear with me, I'm more testing my understanding of VisualWorks
than challenging your decisions.

On Wed, 20 Feb 2008 11:11:54 -0800
"Eliot Miranda" <[hidden email]> wrote:

> 3 reasons. isSameSequenceAs: was written after #= (by me IIRC).

I guessed so. It looks like it is an optimization for when you are
not concerned comparing baskets of apples to bags of oranges.
Empty containers are always sad :-)

> Performance of #= can be important and the change would slow down comparing
> empty collections.

(Time millisecondsToRun: [ |x c1 c2|
x := true.
c1 := OrderedCollection new.
c2 := c1 copy.
10000000 timesRepeat: [x := x & (c1 ### c2)].
x out
]) out

gives the following results on successive runs when
subsituting ### with

= 742, 712, 724
isSameSequenceAs: 407, 416, 408
myEquals 777, 789, 787

where myEquals is the "obvious" refactoring:

myEquals: otherCollection

self species == otherCollection species ifFalse: [^false].
^self isSameSequenceAs: otherCollection

10 million method calls make about a 20th of a second difference
on my 2GHz amd64. So I guess that comparing the species is way
more expensive than another method call. Which justifies the
existence of the separate method, but not the lack of refactoring.
I'm just a grasshopper, what do the masters think?

Looks like a (742 + 712 + 724) - (777 + 789 + 787) / (742 + 712 + 724) asFloat, or 8% slowdown. Quite significant.

Is the code above suitable for measuring what I claim? I tried to
make it so that it's not trivially possible to optimize away what
I'm interested in.

Next, I changed c2 to be an empty array:

(Time millisecondsToRun: [ |x c1 c2|
x := true.
c1 := OrderedCollection new.
c2 := Array new.
10000000 timesRepeat: [x := x & (c1 ### c2)].
x out
]) out

= 597, 582, 612
isSameSequenceAs: 371, 412, 390
myEquals 581, 567, 576

isSameSequence is in the same ballbark as with two OrderedCollections,
both = and myEquals are significantly faster. As it should be, since
they leave early returning false.

?? your figures show isSameSequenceAs: is faster, around 400ms as opposed to ~600ms.

P.S. try something like the following for more stable measurements:

BlockClosure methods for benchmarking
benchmarkMicrosecondsToRunRepetitions: n
    | start end |
    Smalltalk garbageCollect. "side-effect is to flush the compiled code cache"
    start := Time microsecondClock. "invoke once to compile the code"
    self value. "run once to compile the code"
    start := Time microsecodClock.
    1 to: n do: [:i| self value]. "compiler generates inline code for this loop."
    end := Time microsecondClock.
    ^(end - start) / n

> If it ain't broke don't fix it. Hence my call was no to

> change #= to use isSameSequenceAs:.

yeah... that one gets me regularly in trouble :-)

:) The three reasons together still do it for me.

Regards,
s.

Best
Eliot

Stefan Schmiedl

Re: SequenceableCollection, comparing

On Wed, 20 Feb 2008 18:31:27 -0800
"Eliot Miranda" <[hidden email]> wrote:

> >
> > = 597, 582, 612
> > isSameSequenceAs: 371, 412, 390
> > myEquals 581, 567, 576
> >
> > isSameSequence is in the same ballbark as with two OrderedCollections,
> > both = and myEquals are significantly faster. As it should be, since
> > they leave early returning false.
>
>
> ?? your figures show isSameSequenceAs: is faster, around 400ms as opposed
> to ~600ms.

"both = and myEquals are significantly faster" with OC and Array than
with OC and OC. Sorry for the confusion.

> P.S. try something like the following for more stable measurements:
>

Thanks, that one goes straight into my toolbox.

> :) The three reasons together still do it for me.

As I said, I do this for the learning experience. E.g. I learned that
there is a method isSameSequenceAs: already doing size comparisons,
which directly translates into shorter code in my packages.

Thanks,
s.