Idea for a possibly better Collection occurrencesOf method.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Idea for a possibly better Collection occurrencesOf method.

Raymond W. Lucke IV
Hello,

I am new to the Squeak community, and new to Smalltalk itself. I have  
been going through the browser looking at how things work and  
stumbled upon the occurrences of method of Collection:

Collection >> occurrencesOf: anObject
        "Answer how many of the receiver's elements are equal to anObject."

        | tally |
        tally _ 0.
        self do: [:each | anObject = each ifTrue: [tally _ tally + 1]].
        ^tally

Is there any reason why it shouldn't look more like this:

Collection >> occurrencesOf: anObject
        "Answer how many of the receiver's elements are equal to anObject."

        ^self inject: 0 into: [:tally :each |
                        anObject = each
                                ifTrue: [tally _ tally + 1]
                                ifFalse: [tally]]


It seems to me that not using inject there might be reinventing the  
wheel. Please excuse my ignorance if I'm incorrect here, I'm trying  
to learn Smalltalk and get involved. Any ideas?

Regards,

Ray

Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Zulq Alam-2
Raymond W. Lucke IV wrote:

> It seems to me that not using inject there might be reinventing the
> wheel. Please excuse my ignorance if I'm incorrect here, I'm trying to
> learn Smalltalk and get involved. Any ideas?

Is duplication really a problem in this case? I don't think so. Since
you do, I'm curious what you think the risks are?

I do think it could be more readable though.

occurrencesOf: anObject
   "Answer how many of the receiver's elements are equal to anObject."

   ^ self count: [:each | each = anObject]

Thanks,
Zulq.

Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Raymond W. Lucke IV
Hi Zulq,

I didn't see any risks either way, I guess my thought was that it  
would be more elegant to not use the temporary variables. Your way is  
even more elegant. Is there any performance downside to using count  
or inject instead of do with a temporary variable? It would seem to  
me that it would be more efficient the way you just described as well.

Ray


On Sep 11, 2006, at 5:11 PM, Zulq Alam wrote:

> Raymond W. Lucke IV wrote:
>
>> It seems to me that not using inject there might be reinventing  
>> the wheel. Please excuse my ignorance if I'm incorrect here, I'm  
>> trying to learn Smalltalk and get involved. Any ideas?
>
> Is duplication really a problem in this case? I don't think so.  
> Since you do, I'm curious what you think the risks are?
>
> I do think it could be more readable though.
>
> occurrencesOf: anObject
>   "Answer how many of the receiver's elements are equal to anObject."
>
>   ^ self count: [:each | each = anObject]
>
> Thanks,
> Zulq.
>
>
> !DSPAM:4505fab4278191804284693!
>


Reply | Threaded
Open this post in threaded view
|

RE: Idea for a possibly better Collection occurrencesOf method.

Ramon Leon-5
In reply to this post by Raymond W. Lucke IV
> Is there any reason why it shouldn't look more like this:
>
> Collection >> occurrencesOf: anObject
> "Answer how many of the receiver's elements are equal
> to anObject."
>
> ^self inject: 0 into: [:tally :each |
> anObject = each
> ifTrue: [tally _ tally + 1]
> ifFalse: [tally]]
>
>
> Regards,
>
> Ray

Or..

Collection >> occurrencesOf: anObject
    "Answer how many of the receiver's elements are equal to anObject."
    ^self inject: 0 into: [:tally :each |
        anObject = each ifTrue: [tally + 1] ifFalse: [tally]]

No need to add modify tally, simply return it + 1.  There's lot's of old
code in the image, without a comment, it's impossible to know why
occurrencesOf: or count: wasn't used, maybe the author just didn't know
about them.  I wasn't aware of count: myself, but it should be implemented
with inject:into: as well, I'd imagine.  Though implementing with count:
does seem the simplest solution.


Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

timrowledge
In reply to this post by Raymond W. Lucke IV
You might possibly be able to measure a small performance cost to  
using inject:into: because it passes down to a virtually identical  
loop (ie uses a temp, a do: loop etc) and costs you one extra message  
send or so.  If someone chose to write the compiler optimiser code to  
inline inject:into: then it would not cost even that.

Given that modern machines are typically able to send 3-10 million  
messages a second, it probably isn't worth anyone's effort.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Oxymorons: Airline Food



Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Raymond W. Lucke IV
In reply to this post by Ramon Leon-5
Heh. I should have caught that. :)

Anyway, I'm new to Squeak and Smalltalk, and I've gotta say, you guys have a really great system here. I'm highly impressed and trying to get my hands dirty with it as much as possible so I can learn. I've enjoyed the process of learning Smalltalk much more than any other language I've worked with thanks to the nice interface that Squeak presents.

Thanks for developing such a powerful and enjoyable Smalltalk implementation!

Ray

On Sep 11, 2006, at 5:34 PM, Ramon Leon wrote:

 anObject = each ifTrue: [tally + 1] ifFalse: [tally]]


No need to add modify tally, simply return it + 1.  There's lot's of old




Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Andreas.Raab
In reply to this post by timrowledge
tim Rowledge wrote:
> You might possibly be able to measure a small performance cost to using
> inject:into: because it passes down to a virtually identical loop (ie
> uses a temp, a do: loop etc) and costs you one extra message send or
> so.  If someone chose to write the compiler optimiser code to inline
> inject:into: then it would not cost even that.

It is also an extra activation per element of the collection (because
the ifTrue:-block in the original version gets inlined) which actually
shows in a quick benchmark:
   "Create a list with 1 million true/false elements"
   list := Array new: 1000000.
   1 to: list size do:[:i| list at: i put: i even].
   "Search for occurances of true by various means"
   anObject := true.
   tally := 0.
   time1 := [list do:[:each| anObject = each ifTrue:[tally := tally +
1]]] timeToRun.
   time2 := [list inject: 0 into: [:tally :each |
               anObject = each
                   ifTrue: [tally := tally + 1]
                   ifFalse: [tally]]] timeToRun.
   time3 := [list count:[:each| each = anObject]] timeToRun.
   { time1. time2. time3}

This results in #(397 590 600) on my machine with the significant
difference that being between ~400ms (original version) vs. ~600ms (with
extra activation).

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

J J-6
In reply to this post by Raymond W. Lucke IV
Yea, I think count: or inject: would be better.  Maybe the occurencesOf:
predates those
methods and that's why they are not used?


>From: "Raymond W. Lucke IV" <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Mon, 11 Sep 2006 17:29:56 -0700
>
>Hi Zulq,
>
>I didn't see any risks either way, I guess my thought was that it  would be
>more elegant to not use the temporary variables. Your way is  even more
>elegant. Is there any performance downside to using count  or inject
>instead of do with a temporary variable? It would seem to  me that it would
>be more efficient the way you just described as well.
>
>Ray
>
>
>On Sep 11, 2006, at 5:11 PM, Zulq Alam wrote:
>
>>Raymond W. Lucke IV wrote:
>>
>>>It seems to me that not using inject there might be reinventing  the
>>>wheel. Please excuse my ignorance if I'm incorrect here, I'm  trying to
>>>learn Smalltalk and get involved. Any ideas?
>>
>>Is duplication really a problem in this case? I don't think so.  Since you
>>do, I'm curious what you think the risks are?
>>
>>I do think it could be more readable though.
>>
>>occurrencesOf: anObject
>>   "Answer how many of the receiver's elements are equal to anObject."
>>
>>   ^ self count: [:each | each = anObject]
>>
>>Thanks,
>>Zulq.
>>
>>
>>!DSPAM:4505fab4278191804284693!
>>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

J J-6
In reply to this post by timrowledge
I think it would be worth it to change this occurance, and any others to use
the library instead
of hand made counts like this.  That way if someone does make some kind of
optimization then
this function will get that too.


>From: tim Rowledge <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Mon, 11 Sep 2006 17:41:15 -0700
>
>You might possibly be able to measure a small performance cost to  using
>inject:into: because it passes down to a virtually identical  loop (ie uses
>a temp, a do: loop etc) and costs you one extra message  send or so.  If
>someone chose to write the compiler optimiser code to  inline inject:into:
>then it would not cost even that.
>
>Given that modern machines are typically able to send 3-10 million  
>messages a second, it probably isn't worth anyone's effort.
>
>tim
>--
>tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>Oxymorons: Airline Food
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

J J-6
In reply to this post by Andreas.Raab
That's really unfortunate that using the library is slowing then rolling
your own. :(


>From: Andreas Raab <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Mon, 11 Sep 2006 22:03:12 -0400
>
>tim Rowledge wrote:
>>You might possibly be able to measure a small performance cost to using
>>inject:into: because it passes down to a virtually identical loop (ie uses
>>a temp, a do: loop etc) and costs you one extra message send or so.  If
>>someone chose to write the compiler optimiser code to inline inject:into:
>>then it would not cost even that.
>
>It is also an extra activation per element of the collection (because the
>ifTrue:-block in the original version gets inlined) which actually shows in
>a quick benchmark:
>   "Create a list with 1 million true/false elements"
>   list := Array new: 1000000.
>   1 to: list size do:[:i| list at: i put: i even].
>   "Search for occurances of true by various means"
>   anObject := true.
>   tally := 0.
>   time1 := [list do:[:each| anObject = each ifTrue:[tally := tally + 1]]]
>timeToRun.
>   time2 := [list inject: 0 into: [:tally :each |
>               anObject = each
>                   ifTrue: [tally := tally + 1]
>                   ifFalse: [tally]]] timeToRun.
>   time3 := [list count:[:each| each = anObject]] timeToRun.
>   { time1. time2. time3}
>
>This results in #(397 590 600) on my machine with the significant
>difference that being between ~400ms (original version) vs. ~600ms (with
>extra activation).
>
>Cheers,
>   - Andreas
>



Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Ramon Leon-4
>>
>> It is also an extra activation per element of the collection (because
>> the ifTrue:-block in the original version gets inlined) which actually
>> shows in a quick benchmark:
>>   "Create a list with 1 million true/false elements"
>>   list := Array new: 1000000.
>>   1 to: list size do:[:i| list at: i put: i even].
>>   "Search for occurances of true by various means"
>>   anObject := true.
>>   tally := 0.
>>   time1 := [list do:[:each| anObject = each ifTrue:[tally := tally +
>> 1]]] timeToRun.
>>   time2 := [list inject: 0 into: [:tally :each |
>>               anObject = each
>>                   ifTrue: [tally := tally + 1]
>>                   ifFalse: [tally]]] timeToRun.
>>   time3 := [list count:[:each| each = anObject]] timeToRun.
>>   { time1. time2. time3}
>>
>> This results in #(397 590 600) on my machine with the significant
>> difference that being between ~400ms (original version) vs. ~600ms
>> (with extra activation).
>>
>> Cheers,
>>   - Andreas

There's an unnecessary assignment in your inject into test. It should be...

  time2 := [list inject: 0 into: [:tally :each |
               anObject = each
                   ifTrue: [tally + 1]
                   ifFalse: [tally]]] timeToRun.

Not that it makes much difference in the benchmark.


Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Klaus D. Witzel
In reply to this post by J J-6
Hi J J,

mind to explain your comment "that's really unfortunate that using the  
library is slowing".

Whatever "better" method is used, they all have to send the #do: message  
and so:

- #do: must be fastest
- users of #do: must be slower than pure #do:

Only under (very) rare conditions can *more* code yield *less* execution  
time, I posted one of them as a puzzle some time ago

-  
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-July/106462.html

/Klaus

On Tue, 12 Sep 2006 08:45:32 +0200, J J wrote:

> That's really unfortunate that using the library is slowing then rolling  
> your own. :(


Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Ramon Leon-4

>> That's really unfortunate that using the library is slowing then
>> rolling  your own. :(

Don't forget the other side, using the library means you don't have to
remember how to roll your own.


Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Nicolas Cellier-3
In reply to this post by Raymond W. Lucke IV

It's all right if you reduce a loop budget from N*N to N.
But just do not spend to much time on smaller optimization like inlining code.
Exupery should do it for you in near future.
It's better to keep code small and clean with inject:into: and count:
the reason for not using count: is probably historical (no count: message in st80).

Nicolas


________________________________________________________________________
iFRANCE, exprimez-vous !
http://web.ifrance.com


Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Jon Hylands
In reply to this post by Klaus D. Witzel
On Tue, 12 Sep 2006 09:31:40 +0200, "Klaus D. Witzel"
<[hidden email]> wrote:

> Whatever "better" method is used, they all have to send the #do: message  
> and so:
>
> - #do: must be fastest
> - users of #do: must be slower than pure #do:

A while ago, the guys at IBM did a very interesting performance upgrade to
VisualAge Smalltalk - they "fixed" the standard enumeration methods so they
all have (more or less) identical performance. They wrote a collection
iterator primitive method in BlockContext:

apply: aCollection from: start to: end

This method calls an iterator primitive, and all the collection iterators
end up calling this method, and it has taken away pretty much all the
performance differences between using #to:do:, #do:, #select:,
#inject:into:, etc.

It used to be that the only way to get fast collection iteration was to use
#to:do: (because it is inlined). Now they all run at the same speed, which
is really fast.

Later,
Jon

--------------------------------------------------------------
   Jon Hylands      [hidden email]      http://www.huv.com/jon

  Project: Micro Seeker (Micro Autonomous Underwater Vehicle)
           http://www.huv.com

Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

stephane ducasse-2
In reply to this post by Andreas.Raab
At ESUG georg Heeg made an interesting point, he showed that in VW  
loading an st file over 1000 only 1 was actually the compiler works  
(I do not remember exactly the ratio but it was striking). They  
aggressively optimized the loading and (removed the refreshing of  
RB.....).

So this was really making me think that speed is still really again  
an issue and that good optimized libraries are cool
even on our fast machines because we could do even more.

> This results in #(397 590 600) on my machine with the significant  
> difference that being between ~400ms (original version) vs. ~600ms  
> (with extra activation).

So this is clearly a good argument for do: :)

Stef

Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Klaus D. Witzel
In reply to this post by Jon Hylands
On Tue, 12 Sep 2006 12:19:02 +0200, Jon Hylands wrote:

> On Tue, 12 Sep 2006 09:31:40 +0200, "Klaus D. Witzel" wrote:
>
>> Whatever "better" method is used, they all have to send the #do: message
>> and so:
>>
>> - #do: must be fastest
>> - users of #do: must be slower than pure #do:
>
> A while ago, the guys at IBM did a very interesting performance upgrade  
> to
> VisualAge Smalltalk - they "fixed" the standard enumeration methods so  
> they
> all have (more or less) identical performance. They wrote a collection
> iterator primitive method in BlockContext:
>
> apply: aCollection from: start to: end

Heh, why haven't we stupid non-corporate users come up with such a  
brilliant idea ;-)

IMO this is (perhaps) something for Exupery.

> This method calls an iterator primitive, and all the collection iterators
> end up calling this method, and it has taken away pretty much all the
> performance differences between using #to:do:, #do:, #select:,
> #inject:into:, etc.
>
> It used to be that the only way to get fast collection iteration was to  
> use
> #to:do: (because it is inlined). Now they all run at the same speed,  
> which
> is really fast.

Thank you for sharing the find, Jon.

/Klaus

> Later,
> Jon


Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

J J-6
In reply to this post by Klaus D. Witzel
I misspelled "slower" (was thinking rolling and typing slower. :) ).

The point is, if using the library is noticably slower then just rolling
your own then
some people will always use there own.

Look at C++ for an example of what I mean.
std::string has been around for some years now, but it seems like every new
C++ framework
or substancial library has a "FrameWorkOrLibrary::String" defined.  Why?  
Performance.  The
standard library string doesn't have the performance they want accross the
board so every
library writer ever rolls there own.  As a result if you try to be a good
programmer and reuse
code you will end up writing a bunch of stupid conversion functions to all
these types.

I know we are just talking about 2 methods here, but it's a bad president to
set.  Would the
implimentation Jon mentioned work, or what are the bad points?  The debugger
wouldn't let
you walk the code then or?

Thanks

>From: "Klaus D. Witzel" <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: [hidden email]
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Tue, 12 Sep 2006 09:31:40 +0200
>
>Hi J J,
>
>mind to explain your comment "that's really unfortunate that using the  
>library is slowing".
>
>Whatever "better" method is used, they all have to send the #do: message  
>and so:
>
>- #do: must be fastest
>- users of #do: must be slower than pure #do:
>
>Only under (very) rare conditions can *more* code yield *less* execution  
>time, I posted one of them as a puzzle some time ago
>
>-  
>http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-July/106462.html
>
>/Klaus
>
>On Tue, 12 Sep 2006 08:45:32 +0200, J J wrote:
>
>>That's really unfortunate that using the library is slowing then rolling  
>>your own. :(
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

Klaus D. Witzel
On Tue, 12 Sep 2006 13:12:23 +0200, J J wrote:

> I misspelled "slower" (was thinking rolling and typing slower. :) ).

Ah, back in sync again (that's what I expected :)

> The point is, if using the library is noticably slower then just rolling  
> your own then
> some people will always use there own.
> Look at C++ for an example of what I mean.
...
> Why?  Performance.
...
> you will end up writing a bunch of stupid conversion functions to all  
> these types.

Yes, as you said: lack of reuse(able code).

> I know we are just talking about 2 methods here,

... at the heart of the implementation of sets (like: numbers, magnitudes,  
etc) in Smalltalk ...

> but it's a bad president to set.  Would the
> implimentation Jon mentioned work,

100%[tm] perfect. This a typical Smalltalk solution. For an analogous  
concept have a look at what the VM does with #at: and #at:put: (I mean the  
atPutCache).

> or what are the bad points?

Zero (IMHO) iff (!) we'd write comments on why and how this new  
#apply:from:to: must be as it is ;-)

> The debugger wouldn't let
> you walk the code then or?

Zero impact: the VM, when executing the #apply:from:to: primitive, will  
send messages to your "do:" block and the debugger is (can be, dependent  
on the implementation) in full control. Of course one can no longer change  
the from and to arguments in the debugger, but: I could not care less 8-)

/Klaus

> Thanks


Reply | Threaded
Open this post in threaded view
|

Re: Idea for a possibly better Collection occurrencesOf method.

J J-6
Nice, sounds like a done deal then.  Will this make 3.9 release? :)


>From: "Klaus D. Witzel" <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: [hidden email]
>Subject: Re: Idea for a possibly better Collection occurrencesOf method.
>Date: Tue, 12 Sep 2006 13:33:43 +0200
>I  could not care less 8-)
>

I'm glad you say this phrase right.  Drives me crazy when people use "I
could care less" and I hear it so often I was starting to think I was the
only one to get it right. :)



123