Smalltalk › Squeak › Squeak - Dev

Idea for a possibly better Collection occurrencesOf method.

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

51 messages Options

123

Raymond W. Lucke IV

Idea for a possibly better Collection occurrencesOf method.

Hello,

I am new to the Squeak community, and new to Smalltalk itself. I have
been going through the browser looking at how things work and
stumbled upon the occurrences of method of Collection:

Collection >> occurrencesOf: anObject
"Answer how many of the receiver's elements are equal to anObject."

| tally |
tally _ 0.
self do: [:each | anObject = each ifTrue: [tally _ tally + 1]].
^tally

Is there any reason why it shouldn't look more like this:

Collection >> occurrencesOf: anObject
"Answer how many of the receiver's elements are equal to anObject."

^self inject: 0 into: [:tally :each |
anObject = each
ifTrue: [tally _ tally + 1]
ifFalse: [tally]]

It seems to me that not using inject there might be reinventing the
wheel. Please excuse my ignorance if I'm incorrect here, I'm trying
to learn Smalltalk and get involved. Any ideas?

Regards,

Ray

Zulq Alam-2

Re: Idea for a possibly better Collection occurrencesOf method.

Raymond W. Lucke IV wrote:

> It seems to me that not using inject there might be reinventing the
> wheel. Please excuse my ignorance if I'm incorrect here, I'm trying to
> learn Smalltalk and get involved. Any ideas?

Is duplication really a problem in this case? I don't think so. Since
you do, I'm curious what you think the risks are?

I do think it could be more readable though.

occurrencesOf: anObject
"Answer how many of the receiver's elements are equal to anObject."

^ self count: [:each | each = anObject]

Thanks,
Zulq.

Raymond W. Lucke IV

Re: Idea for a possibly better Collection occurrencesOf method.

Hi Zulq,

I didn't see any risks either way, I guess my thought was that it
would be more elegant to not use the temporary variables. Your way is
even more elegant. Is there any performance downside to using count
or inject instead of do with a temporary variable? It would seem to
me that it would be more efficient the way you just described as well.

Ray

On Sep 11, 2006, at 5:11 PM, Zulq Alam wrote:

> Raymond W. Lucke IV wrote:
>
>> It seems to me that not using inject there might be reinventing
>> the wheel. Please excuse my ignorance if I'm incorrect here, I'm
>> trying to learn Smalltalk and get involved. Any ideas?
>
> Is duplication really a problem in this case? I don't think so.
> Since you do, I'm curious what you think the risks are?
>
> I do think it could be more readable though.
>
> occurrencesOf: anObject
> "Answer how many of the receiver's elements are equal to anObject."
>
> ^ self count: [:each | each = anObject]
>
> Thanks,
> Zulq.
>
>
> !DSPAM:4505fab4278191804284693!
>

Ramon Leon-5

RE: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Raymond W. Lucke IV

> Is there any reason why it shouldn't look more like this:
>
> Collection >> occurrencesOf: anObject
> "Answer how many of the receiver's elements are equal
> to anObject."
>
> ^self inject: 0 into: [:tally :each |
> anObject = each
> ifTrue: [tally _ tally + 1]
> ifFalse: [tally]]
>
>
> Regards,
>
> Ray

Or..

Collection >> occurrencesOf: anObject
"Answer how many of the receiver's elements are equal to anObject."
^self inject: 0 into: [:tally :each |
anObject = each ifTrue: [tally + 1] ifFalse: [tally]]

No need to add modify tally, simply return it + 1. There's lot's of old
code in the image, without a comment, it's impossible to know why
occurrencesOf: or count: wasn't used, maybe the author just didn't know
about them. I wasn't aware of count: myself, but it should be implemented
with inject:into: as well, I'd imagine. Though implementing with count:
does seem the simplest solution.

timrowledge

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Raymond W. Lucke IV

You might possibly be able to measure a small performance cost to
using inject:into: because it passes down to a virtually identical
loop (ie uses a temp, a do: loop etc) and costs you one extra message
send or so. If someone chose to write the compiler optimiser code to
inline inject:into: then it would not cost even that.

Given that modern machines are typically able to send 3-10 million
messages a second, it probably isn't worth anyone's effort.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Oxymorons: Airline Food

Raymond W. Lucke IV

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Ramon Leon-5

Heh. I should have caught that. :)

Anyway, I'm new to Squeak and Smalltalk, and I've gotta say, you guys have a really great system here. I'm highly impressed and trying to get my hands dirty with it as much as possible so I can learn. I've enjoyed the process of learning Smalltalk much more than any other language I've worked with thanks to the nice interface that Squeak presents.

Thanks for developing such a powerful and enjoyable Smalltalk implementation!

Ray

On Sep 11, 2006, at 5:34 PM, Ramon Leon wrote:

anObject = each ifTrue: [tally + 1] ifFalse: [tally]]

No need to add modify tally, simply return it + 1. There's lot's of old

Andreas.Raab

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by timrowledge

tim Rowledge wrote:
> You might possibly be able to measure a small performance cost to using
> inject:into: because it passes down to a virtually identical loop (ie
> uses a temp, a do: loop etc) and costs you one extra message send or
> so. If someone chose to write the compiler optimiser code to inline
> inject:into: then it would not cost even that.

It is also an extra activation per element of the collection (because
the ifTrue:-block in the original version gets inlined) which actually
shows in a quick benchmark:
"Create a list with 1 million true/false elements"
list := Array new: 1000000.
1 to: list size do:[:i| list at: i put: i even].
"Search for occurances of true by various means"
anObject := true.
tally := 0.
time1 := [list do:[:each| anObject = each ifTrue:[tally := tally +
1]]] timeToRun.
time2 := [list inject: 0 into: [:tally :each |
anObject = each
ifTrue: [tally := tally + 1]
ifFalse: [tally]]] timeToRun.
time3 := [list count:[:each| each = anObject]] timeToRun.
{ time1. time2. time3}

This results in #(397 590 600) on my machine with the significant
difference that being between ~400ms (original version) vs. ~600ms (with
extra activation).

Cheers,
- Andreas

J J-6

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Raymond W. Lucke IV

Yea, I think count: or inject: would be better. Maybe the occurencesOf:
predates those
methods and that's why they are not used?

>From: "Raymond W. Lucke IV" <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Mon, 11 Sep 2006 17:29:56 -0700
>
>Hi Zulq,
>
>I didn't see any risks either way, I guess my thought was that it would be
>more elegant to not use the temporary variables. Your way is even more
>elegant. Is there any performance downside to using count or inject
>instead of do with a temporary variable? It would seem to me that it would
>be more efficient the way you just described as well.
>
>Ray
>
>
>On Sep 11, 2006, at 5:11 PM, Zulq Alam wrote:
>
>>Raymond W. Lucke IV wrote:
>>
>>>It seems to me that not using inject there might be reinventing the
>>>wheel. Please excuse my ignorance if I'm incorrect here, I'm trying to
>>>learn Smalltalk and get involved. Any ideas?
>>
>>Is duplication really a problem in this case? I don't think so. Since you
>>do, I'm curious what you think the risks are?
>>
>>I do think it could be more readable though.
>>
>>occurrencesOf: anObject
>> "Answer how many of the receiver's elements are equal to anObject."
>>
>> ^ self count: [:each | each = anObject]
>>
>>Thanks,
>>Zulq.
>>
>>
>>!DSPAM:4505fab4278191804284693!
>>
>
>

J J-6

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by timrowledge

I think it would be worth it to change this occurance, and any others to use
the library instead
of hand made counts like this. That way if someone does make some kind of
optimization then
this function will get that too.

>From: tim Rowledge <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Mon, 11 Sep 2006 17:41:15 -0700
>
>You might possibly be able to measure a small performance cost to using
>inject:into: because it passes down to a virtually identical loop (ie uses
>a temp, a do: loop etc) and costs you one extra message send or so. If
>someone chose to write the compiler optimiser code to inline inject:into:
>then it would not cost even that.
>
>Given that modern machines are typically able to send 3-10 million
>messages a second, it probably isn't worth anyone's effort.
>
>tim
>--
>tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>Oxymorons: Airline Food
>
>
>

J J-6

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Andreas.Raab

That's really unfortunate that using the library is slowing then rolling
your own. :(

>From: Andreas Raab <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Mon, 11 Sep 2006 22:03:12 -0400
>
>tim Rowledge wrote:
>>You might possibly be able to measure a small performance cost to using
>>inject:into: because it passes down to a virtually identical loop (ie uses
>>a temp, a do: loop etc) and costs you one extra message send or so. If
>>someone chose to write the compiler optimiser code to inline inject:into:
>>then it would not cost even that.
>
>It is also an extra activation per element of the collection (because the
>ifTrue:-block in the original version gets inlined) which actually shows in
>a quick benchmark:
> "Create a list with 1 million true/false elements"
> list := Array new: 1000000.
> 1 to: list size do:[:i| list at: i put: i even].
> "Search for occurances of true by various means"
> anObject := true.
> tally := 0.
> time1 := [list do:[:each| anObject = each ifTrue:[tally := tally + 1]]]
>timeToRun.
> time2 := [list inject: 0 into: [:tally :each |
> anObject = each
> ifTrue: [tally := tally + 1]
> ifFalse: [tally]]] timeToRun.
> time3 := [list count:[:each| each = anObject]] timeToRun.
> { time1. time2. time3}
>
>This results in #(397 590 600) on my machine with the significant
>difference that being between ~400ms (original version) vs. ~600ms (with
>extra activation).
>
>Cheers,
> - Andreas
>

Ramon Leon-4

Re: Idea for a possibly better Collection occurrencesOf method.

>>
>> It is also an extra activation per element of the collection (because
>> the ifTrue:-block in the original version gets inlined) which actually
>> shows in a quick benchmark:
>> "Create a list with 1 million true/false elements"
>> list := Array new: 1000000.
>> 1 to: list size do:[:i| list at: i put: i even].
>> "Search for occurances of true by various means"
>> anObject := true.
>> tally := 0.
>> time1 := [list do:[:each| anObject = each ifTrue:[tally := tally +
>> 1]]] timeToRun.
>> time2 := [list inject: 0 into: [:tally :each |
>> anObject = each
>> ifTrue: [tally := tally + 1]
>> ifFalse: [tally]]] timeToRun.
>> time3 := [list count:[:each| each = anObject]] timeToRun.
>> { time1. time2. time3}
>>
>> This results in #(397 590 600) on my machine with the significant
>> difference that being between ~400ms (original version) vs. ~600ms
>> (with extra activation).
>>
>> Cheers,
>> - Andreas

There's an unnecessary assignment in your inject into test. It should be...

time2 := [list inject: 0 into: [:tally :each |
anObject = each
ifTrue: [tally + 1]
ifFalse: [tally]]] timeToRun.

Not that it makes much difference in the benchmark.

Klaus D. Witzel

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by J J-6

Hi J J,

mind to explain your comment "that's really unfortunate that using the
library is slowing".

Whatever "better" method is used, they all have to send the #do: message
and so:

- #do: must be fastest
- users of #do: must be slower than pure #do:

Only under (very) rare conditions can *more* code yield *less* execution
time, I posted one of them as a puzzle some time ago

-
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-July/106462.html

/Klaus

On Tue, 12 Sep 2006 08:45:32 +0200, J J wrote:

> That's really unfortunate that using the library is slowing then rolling
> your own. :(

Ramon Leon-4

Re: Idea for a possibly better Collection occurrencesOf method.

>> That's really unfortunate that using the library is slowing then
>> rolling your own. :(

Don't forget the other side, using the library means you don't have to
remember how to roll your own.

Nicolas Cellier-3

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Raymond W. Lucke IV

It's all right if you reduce a loop budget from N*N to N.
But just do not spend to much time on smaller optimization like inlining code.
Exupery should do it for you in near future.
It's better to keep code small and clean with inject:into: and count:
the reason for not using count: is probably historical (no count: message in st80).

Nicolas

________________________________________________________________________
iFRANCE, exprimez-vous !
http://web.ifrance.com

Jon Hylands

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Klaus D. Witzel

On Tue, 12 Sep 2006 09:31:40 +0200, "Klaus D. Witzel"
<[hidden email]> wrote:

> Whatever "better" method is used, they all have to send the #do: message
> and so:
>
> - #do: must be fastest
> - users of #do: must be slower than pure #do:

A while ago, the guys at IBM did a very interesting performance upgrade to
VisualAge Smalltalk - they "fixed" the standard enumeration methods so they
all have (more or less) identical performance. They wrote a collection
iterator primitive method in BlockContext:

apply: aCollection from: start to: end

This method calls an iterator primitive, and all the collection iterators
end up calling this method, and it has taken away pretty much all the
performance differences between using #to:do:, #do:, #select:,
#inject:into:, etc.

It used to be that the only way to get fast collection iteration was to use
#to:do: (because it is inlined). Now they all run at the same speed, which
is really fast.

Later,
Jon

--------------------------------------------------------------
Jon Hylands [hidden email] http://www.huv.com/jon

Project: Micro Seeker (Micro Autonomous Underwater Vehicle)
http://www.huv.com

stephane ducasse-2

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Andreas.Raab

At ESUG georg Heeg made an interesting point, he showed that in VW
loading an st file over 1000 only 1 was actually the compiler works
(I do not remember exactly the ratio but it was striking). They
aggressively optimized the loading and (removed the refreshing of
RB.....).

So this was really making me think that speed is still really again
an issue and that good optimized libraries are cool
even on our fast machines because we could do even more.

> This results in #(397 590 600) on my machine with the significant
> difference that being between ~400ms (original version) vs. ~600ms
> (with extra activation).

So this is clearly a good argument for do: :)

Stef

Klaus D. Witzel

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Jon Hylands

On Tue, 12 Sep 2006 12:19:02 +0200, Jon Hylands wrote:

> On Tue, 12 Sep 2006 09:31:40 +0200, "Klaus D. Witzel" wrote:
>
>> Whatever "better" method is used, they all have to send the #do: message
>> and so:
>>
>> - #do: must be fastest
>> - users of #do: must be slower than pure #do:
>
> A while ago, the guys at IBM did a very interesting performance upgrade
> to
> VisualAge Smalltalk - they "fixed" the standard enumeration methods so
> they
> all have (more or less) identical performance. They wrote a collection
> iterator primitive method in BlockContext:
>
> apply: aCollection from: start to: end

Heh, why haven't we stupid non-corporate users come up with such a
brilliant idea ;-)

IMO this is (perhaps) something for Exupery.

> This method calls an iterator primitive, and all the collection iterators
> end up calling this method, and it has taken away pretty much all the
> performance differences between using #to:do:, #do:, #select:,
> #inject:into:, etc.
>
> It used to be that the only way to get fast collection iteration was to
> use
> #to:do: (because it is inlined). Now they all run at the same speed,
> which
> is really fast.

Thank you for sharing the find, Jon.

/Klaus

> Later,
> Jon

J J-6

Re: Idea for a possibly better Collection occurrencesOf method.

In reply to this post by Klaus D. Witzel

I misspelled "slower" (was thinking rolling and typing slower. :) ).

The point is, if using the library is noticably slower then just rolling
your own then
some people will always use there own.

Look at C++ for an example of what I mean.
std::string has been around for some years now, but it seems like every new
C++ framework
or substancial library has a "FrameWorkOrLibrary::String" defined. Why?
Performance. The
standard library string doesn't have the performance they want accross the
board so every
library writer ever rolls there own. As a result if you try to be a good
programmer and reuse
code you will end up writing a bunch of stupid conversion functions to all
these types.

I know we are just talking about 2 methods here, but it's a bad president to
set. Would the
implimentation Jon mentioned work, or what are the bad points? The debugger
wouldn't let
you walk the code then or?

Thanks

>From: "Klaus D. Witzel" <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: [hidden email]
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Tue, 12 Sep 2006 09:31:40 +0200
>
>Hi J J,
>
>mind to explain your comment "that's really unfortunate that using the
>library is slowing".
>
>Whatever "better" method is used, they all have to send the #do: message
>and so:
>
>- #do: must be fastest
>- users of #do: must be slower than pure #do:
>
>Only under (very) rare conditions can *more* code yield *less* execution
>time, I posted one of them as a puzzle some time ago
>
>-
>http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-July/106462.html
>
>/Klaus
>
>On Tue, 12 Sep 2006 08:45:32 +0200, J J wrote:
>
>>That's really unfortunate that using the library is slowing then rolling
>>your own. :(
>
>

Klaus D. Witzel

Re: Idea for a possibly better Collection occurrencesOf method.

On Tue, 12 Sep 2006 13:12:23 +0200, J J wrote:

> I misspelled "slower" (was thinking rolling and typing slower. :) ).

Ah, back in sync again (that's what I expected :)

> The point is, if using the library is noticably slower then just rolling
> your own then
> some people will always use there own.
> Look at C++ for an example of what I mean.
...
> Why? Performance.
...
> you will end up writing a bunch of stupid conversion functions to all
> these types.

Yes, as you said: lack of reuse(able code).

> I know we are just talking about 2 methods here,

... at the heart of the implementation of sets (like: numbers, magnitudes,
etc) in Smalltalk ...

> but it's a bad president to set. Would the
> implimentation Jon mentioned work,

100%[tm] perfect. This a typical Smalltalk solution. For an analogous
concept have a look at what the VM does with #at: and #at:put: (I mean the
atPutCache).

> or what are the bad points?

Zero (IMHO) iff (!) we'd write comments on why and how this new
#apply:from:to: must be as it is ;-)

> The debugger wouldn't let
> you walk the code then or?

Zero impact: the VM, when executing the #apply:from:to: primitive, will
send messages to your "do:" block and the debugger is (can be, dependent
on the implementation) in full control. Of course one can no longer change
the from and to arguments in the debugger, but: I could not care less 8-)

/Klaus

> Thanks

J J-6

Re: Idea for a possibly better Collection occurrencesOf method.

Nice, sounds like a done deal then. Will this make 3.9 release? :)

>From: "Klaus D. Witzel" <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: [hidden email]
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Tue, 12 Sep 2006 13:33:43 +0200
>I could not care less 8-)
>

I'm glad you say this phrase right. Drives me crazy when people use "I
could care less" and I hear it so often I was starting to think I was the
only one to get it right. :)

123