Hello,
I am new to the Squeak community, and new to Smalltalk itself. I have been going through the browser looking at how things work and stumbled upon the occurrences of method of Collection: Collection >> occurrencesOf: anObject "Answer how many of the receiver's elements are equal to anObject." | tally | tally _ 0. self do: [:each | anObject = each ifTrue: [tally _ tally + 1]]. ^tally Is there any reason why it shouldn't look more like this: Collection >> occurrencesOf: anObject "Answer how many of the receiver's elements are equal to anObject." ^self inject: 0 into: [:tally :each | anObject = each ifTrue: [tally _ tally + 1] ifFalse: [tally]] It seems to me that not using inject there might be reinventing the wheel. Please excuse my ignorance if I'm incorrect here, I'm trying to learn Smalltalk and get involved. Any ideas? Regards, Ray |
Raymond W. Lucke IV wrote:
> It seems to me that not using inject there might be reinventing the > wheel. Please excuse my ignorance if I'm incorrect here, I'm trying to > learn Smalltalk and get involved. Any ideas? Is duplication really a problem in this case? I don't think so. Since you do, I'm curious what you think the risks are? I do think it could be more readable though. occurrencesOf: anObject "Answer how many of the receiver's elements are equal to anObject." ^ self count: [:each | each = anObject] Thanks, Zulq. |
Hi Zulq,
I didn't see any risks either way, I guess my thought was that it would be more elegant to not use the temporary variables. Your way is even more elegant. Is there any performance downside to using count or inject instead of do with a temporary variable? It would seem to me that it would be more efficient the way you just described as well. Ray On Sep 11, 2006, at 5:11 PM, Zulq Alam wrote: > Raymond W. Lucke IV wrote: > >> It seems to me that not using inject there might be reinventing >> the wheel. Please excuse my ignorance if I'm incorrect here, I'm >> trying to learn Smalltalk and get involved. Any ideas? > > Is duplication really a problem in this case? I don't think so. > Since you do, I'm curious what you think the risks are? > > I do think it could be more readable though. > > occurrencesOf: anObject > "Answer how many of the receiver's elements are equal to anObject." > > ^ self count: [:each | each = anObject] > > Thanks, > Zulq. > > > !DSPAM:4505fab4278191804284693! > |
In reply to this post by Raymond W. Lucke IV
> Is there any reason why it shouldn't look more like this:
> > Collection >> occurrencesOf: anObject > "Answer how many of the receiver's elements are equal > to anObject." > > ^self inject: 0 into: [:tally :each | > anObject = each > ifTrue: [tally _ tally + 1] > ifFalse: [tally]] > > > Regards, > > Ray Or.. Collection >> occurrencesOf: anObject "Answer how many of the receiver's elements are equal to anObject." ^self inject: 0 into: [:tally :each | anObject = each ifTrue: [tally + 1] ifFalse: [tally]] No need to add modify tally, simply return it + 1. There's lot's of old code in the image, without a comment, it's impossible to know why occurrencesOf: or count: wasn't used, maybe the author just didn't know about them. I wasn't aware of count: myself, but it should be implemented with inject:into: as well, I'd imagine. Though implementing with count: does seem the simplest solution. |
In reply to this post by Raymond W. Lucke IV
You might possibly be able to measure a small performance cost to
using inject:into: because it passes down to a virtually identical loop (ie uses a temp, a do: loop etc) and costs you one extra message send or so. If someone chose to write the compiler optimiser code to inline inject:into: then it would not cost even that. Given that modern machines are typically able to send 3-10 million messages a second, it probably isn't worth anyone's effort. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Oxymorons: Airline Food |
In reply to this post by Ramon Leon-5
Heh. I should have caught that. :)
Anyway, I'm new to Squeak and Smalltalk, and I've gotta say, you guys have a really great system here. I'm highly impressed and trying to get my hands dirty with it as much as possible so I can learn. I've enjoyed the process of learning Smalltalk much more than any other language I've worked with thanks to the nice interface that Squeak presents. Thanks for developing such a powerful and enjoyable Smalltalk implementation! Ray On Sep 11, 2006, at 5:34 PM, Ramon Leon wrote:
|
In reply to this post by timrowledge
tim Rowledge wrote:
> You might possibly be able to measure a small performance cost to using > inject:into: because it passes down to a virtually identical loop (ie > uses a temp, a do: loop etc) and costs you one extra message send or > so. If someone chose to write the compiler optimiser code to inline > inject:into: then it would not cost even that. It is also an extra activation per element of the collection (because the ifTrue:-block in the original version gets inlined) which actually shows in a quick benchmark: "Create a list with 1 million true/false elements" list := Array new: 1000000. 1 to: list size do:[:i| list at: i put: i even]. "Search for occurances of true by various means" anObject := true. tally := 0. time1 := [list do:[:each| anObject = each ifTrue:[tally := tally + 1]]] timeToRun. time2 := [list inject: 0 into: [:tally :each | anObject = each ifTrue: [tally := tally + 1] ifFalse: [tally]]] timeToRun. time3 := [list count:[:each| each = anObject]] timeToRun. { time1. time2. time3} This results in #(397 590 600) on my machine with the significant difference that being between ~400ms (original version) vs. ~600ms (with extra activation). Cheers, - Andreas |
In reply to this post by Raymond W. Lucke IV
Yea, I think count: or inject: would be better. Maybe the occurencesOf:
predates those methods and that's why they are not used? >From: "Raymond W. Lucke IV" <[hidden email]> >Reply-To: The general-purpose Squeak developers >list<[hidden email]> >To: The general-purpose Squeak developers >list<[hidden email]> >Subject: Re: Idea for a possibly better Collection occurrencesOf method. >Date: Mon, 11 Sep 2006 17:29:56 -0700 > >Hi Zulq, > >I didn't see any risks either way, I guess my thought was that it would be >more elegant to not use the temporary variables. Your way is even more >elegant. Is there any performance downside to using count or inject >instead of do with a temporary variable? It would seem to me that it would >be more efficient the way you just described as well. > >Ray > > >On Sep 11, 2006, at 5:11 PM, Zulq Alam wrote: > >>Raymond W. Lucke IV wrote: >> >>>It seems to me that not using inject there might be reinventing the >>>wheel. Please excuse my ignorance if I'm incorrect here, I'm trying to >>>learn Smalltalk and get involved. Any ideas? >> >>Is duplication really a problem in this case? I don't think so. Since you >>do, I'm curious what you think the risks are? >> >>I do think it could be more readable though. >> >>occurrencesOf: anObject >> "Answer how many of the receiver's elements are equal to anObject." >> >> ^ self count: [:each | each = anObject] >> >>Thanks, >>Zulq. >> >> >>!DSPAM:4505fab4278191804284693! >> > > |
In reply to this post by timrowledge
I think it would be worth it to change this occurance, and any others to use
the library instead of hand made counts like this. That way if someone does make some kind of optimization then this function will get that too. >From: tim Rowledge <[hidden email]> >Reply-To: The general-purpose Squeak developers >list<[hidden email]> >To: The general-purpose Squeak developers >list<[hidden email]> >Subject: Re: Idea for a possibly better Collection occurrencesOf method. >Date: Mon, 11 Sep 2006 17:41:15 -0700 > >You might possibly be able to measure a small performance cost to using >inject:into: because it passes down to a virtually identical loop (ie uses >a temp, a do: loop etc) and costs you one extra message send or so. If >someone chose to write the compiler optimiser code to inline inject:into: >then it would not cost even that. > >Given that modern machines are typically able to send 3-10 million >messages a second, it probably isn't worth anyone's effort. > >tim >-- >tim Rowledge; [hidden email]; http://www.rowledge.org/tim >Oxymorons: Airline Food > > > |
In reply to this post by Andreas.Raab
That's really unfortunate that using the library is slowing then rolling
your own. :( >From: Andreas Raab <[hidden email]> >Reply-To: The general-purpose Squeak developers >list<[hidden email]> >To: The general-purpose Squeak developers >list<[hidden email]> >Subject: Re: Idea for a possibly better Collection occurrencesOf method. >Date: Mon, 11 Sep 2006 22:03:12 -0400 > >tim Rowledge wrote: >>You might possibly be able to measure a small performance cost to using >>inject:into: because it passes down to a virtually identical loop (ie uses >>a temp, a do: loop etc) and costs you one extra message send or so. If >>someone chose to write the compiler optimiser code to inline inject:into: >>then it would not cost even that. > >It is also an extra activation per element of the collection (because the >ifTrue:-block in the original version gets inlined) which actually shows in >a quick benchmark: > "Create a list with 1 million true/false elements" > list := Array new: 1000000. > 1 to: list size do:[:i| list at: i put: i even]. > "Search for occurances of true by various means" > anObject := true. > tally := 0. > time1 := [list do:[:each| anObject = each ifTrue:[tally := tally + 1]]] >timeToRun. > time2 := [list inject: 0 into: [:tally :each | > anObject = each > ifTrue: [tally := tally + 1] > ifFalse: [tally]]] timeToRun. > time3 := [list count:[:each| each = anObject]] timeToRun. > { time1. time2. time3} > >This results in #(397 590 600) on my machine with the significant >difference that being between ~400ms (original version) vs. ~600ms (with >extra activation). > >Cheers, > - Andreas > |
>>
>> It is also an extra activation per element of the collection (because >> the ifTrue:-block in the original version gets inlined) which actually >> shows in a quick benchmark: >> "Create a list with 1 million true/false elements" >> list := Array new: 1000000. >> 1 to: list size do:[:i| list at: i put: i even]. >> "Search for occurances of true by various means" >> anObject := true. >> tally := 0. >> time1 := [list do:[:each| anObject = each ifTrue:[tally := tally + >> 1]]] timeToRun. >> time2 := [list inject: 0 into: [:tally :each | >> anObject = each >> ifTrue: [tally := tally + 1] >> ifFalse: [tally]]] timeToRun. >> time3 := [list count:[:each| each = anObject]] timeToRun. >> { time1. time2. time3} >> >> This results in #(397 590 600) on my machine with the significant >> difference that being between ~400ms (original version) vs. ~600ms >> (with extra activation). >> >> Cheers, >> - Andreas There's an unnecessary assignment in your inject into test. It should be... time2 := [list inject: 0 into: [:tally :each | anObject = each ifTrue: [tally + 1] ifFalse: [tally]]] timeToRun. Not that it makes much difference in the benchmark. |
In reply to this post by J J-6
Hi J J,
mind to explain your comment "that's really unfortunate that using the library is slowing". Whatever "better" method is used, they all have to send the #do: message and so: - #do: must be fastest - users of #do: must be slower than pure #do: Only under (very) rare conditions can *more* code yield *less* execution time, I posted one of them as a puzzle some time ago - http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-July/106462.html /Klaus On Tue, 12 Sep 2006 08:45:32 +0200, J J wrote: > That's really unfortunate that using the library is slowing then rolling > your own. :( |
>> That's really unfortunate that using the library is slowing then >> rolling your own. :( Don't forget the other side, using the library means you don't have to remember how to roll your own. |
In reply to this post by Raymond W. Lucke IV
It's all right if you reduce a loop budget from N*N to N. But just do not spend to much time on smaller optimization like inlining code. Exupery should do it for you in near future. It's better to keep code small and clean with inject:into: and count: the reason for not using count: is probably historical (no count: message in st80). Nicolas ________________________________________________________________________ iFRANCE, exprimez-vous ! http://web.ifrance.com |
In reply to this post by Klaus D. Witzel
On Tue, 12 Sep 2006 09:31:40 +0200, "Klaus D. Witzel"
<[hidden email]> wrote: > Whatever "better" method is used, they all have to send the #do: message > and so: > > - #do: must be fastest > - users of #do: must be slower than pure #do: A while ago, the guys at IBM did a very interesting performance upgrade to VisualAge Smalltalk - they "fixed" the standard enumeration methods so they all have (more or less) identical performance. They wrote a collection iterator primitive method in BlockContext: apply: aCollection from: start to: end This method calls an iterator primitive, and all the collection iterators end up calling this method, and it has taken away pretty much all the performance differences between using #to:do:, #do:, #select:, #inject:into:, etc. It used to be that the only way to get fast collection iteration was to use #to:do: (because it is inlined). Now they all run at the same speed, which is really fast. Later, Jon -------------------------------------------------------------- Jon Hylands [hidden email] http://www.huv.com/jon Project: Micro Seeker (Micro Autonomous Underwater Vehicle) http://www.huv.com |
In reply to this post by Andreas.Raab
At ESUG georg Heeg made an interesting point, he showed that in VW
loading an st file over 1000 only 1 was actually the compiler works (I do not remember exactly the ratio but it was striking). They aggressively optimized the loading and (removed the refreshing of RB.....). So this was really making me think that speed is still really again an issue and that good optimized libraries are cool even on our fast machines because we could do even more. > This results in #(397 590 600) on my machine with the significant > difference that being between ~400ms (original version) vs. ~600ms > (with extra activation). So this is clearly a good argument for do: :) Stef |
In reply to this post by Jon Hylands
On Tue, 12 Sep 2006 12:19:02 +0200, Jon Hylands wrote:
> On Tue, 12 Sep 2006 09:31:40 +0200, "Klaus D. Witzel" wrote: > >> Whatever "better" method is used, they all have to send the #do: message >> and so: >> >> - #do: must be fastest >> - users of #do: must be slower than pure #do: > > A while ago, the guys at IBM did a very interesting performance upgrade > to > VisualAge Smalltalk - they "fixed" the standard enumeration methods so > they > all have (more or less) identical performance. They wrote a collection > iterator primitive method in BlockContext: > > apply: aCollection from: start to: end Heh, why haven't we stupid non-corporate users come up with such a brilliant idea ;-) IMO this is (perhaps) something for Exupery. > This method calls an iterator primitive, and all the collection iterators > end up calling this method, and it has taken away pretty much all the > performance differences between using #to:do:, #do:, #select:, > #inject:into:, etc. > > It used to be that the only way to get fast collection iteration was to > use > #to:do: (because it is inlined). Now they all run at the same speed, > which > is really fast. Thank you for sharing the find, Jon. /Klaus > Later, > Jon |
In reply to this post by Klaus D. Witzel
I misspelled "slower" (was thinking rolling and typing slower. :) ).
The point is, if using the library is noticably slower then just rolling your own then some people will always use there own. Look at C++ for an example of what I mean. std::string has been around for some years now, but it seems like every new C++ framework or substancial library has a "FrameWorkOrLibrary::String" defined. Why? Performance. The standard library string doesn't have the performance they want accross the board so every library writer ever rolls there own. As a result if you try to be a good programmer and reuse code you will end up writing a bunch of stupid conversion functions to all these types. I know we are just talking about 2 methods here, but it's a bad president to set. Would the implimentation Jon mentioned work, or what are the bad points? The debugger wouldn't let you walk the code then or? Thanks >From: "Klaus D. Witzel" <[hidden email]> >Reply-To: The general-purpose Squeak developers >list<[hidden email]> >To: [hidden email] >Subject: Re: Idea for a possibly better Collection occurrencesOf method. >Date: Tue, 12 Sep 2006 09:31:40 +0200 > >Hi J J, > >mind to explain your comment "that's really unfortunate that using the >library is slowing". > >Whatever "better" method is used, they all have to send the #do: message >and so: > >- #do: must be fastest >- users of #do: must be slower than pure #do: > >Only under (very) rare conditions can *more* code yield *less* execution >time, I posted one of them as a puzzle some time ago > >- >http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-July/106462.html > >/Klaus > >On Tue, 12 Sep 2006 08:45:32 +0200, J J wrote: > >>That's really unfortunate that using the library is slowing then rolling >>your own. :( > > |
On Tue, 12 Sep 2006 13:12:23 +0200, J J wrote:
> I misspelled "slower" (was thinking rolling and typing slower. :) ). Ah, back in sync again (that's what I expected :) > The point is, if using the library is noticably slower then just rolling > your own then > some people will always use there own. > Look at C++ for an example of what I mean. ... > Why? Performance. ... > you will end up writing a bunch of stupid conversion functions to all > these types. Yes, as you said: lack of reuse(able code). > I know we are just talking about 2 methods here, ... at the heart of the implementation of sets (like: numbers, magnitudes, etc) in Smalltalk ... > but it's a bad president to set. Would the > implimentation Jon mentioned work, 100%[tm] perfect. This a typical Smalltalk solution. For an analogous concept have a look at what the VM does with #at: and #at:put: (I mean the atPutCache). > or what are the bad points? Zero (IMHO) iff (!) we'd write comments on why and how this new #apply:from:to: must be as it is ;-) > The debugger wouldn't let > you walk the code then or? Zero impact: the VM, when executing the #apply:from:to: primitive, will send messages to your "do:" block and the debugger is (can be, dependent on the implementation) in full control. Of course one can no longer change the from and to arguments in the debugger, but: I could not care less 8-) /Klaus > Thanks |
Nice, sounds like a done deal then. Will this make 3.9 release? :)
>From: "Klaus D. Witzel" <[hidden email]> >Reply-To: The general-purpose Squeak developers >list<[hidden email]> >To: [hidden email] >Subject: Re: Idea for a possibly better Collection occurrencesOf method. >Date: Tue, 12 Sep 2006 13:33:43 +0200 >I could not care less 8-) > I'm glad you say this phrase right. Drives me crazy when people use "I could care less" and I hear it so often I was starting to think I was the only one to get it right. :) |
Free forum by Nabble | Edit this page |