Eliot Miranda uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-eem.792.mcz ==================== Summary ==================== Name: Collections-eem.792 Author: eem Time: 3 May 2018, 12:55:52.175146 am UUID: 7d8995ed-835e-44b0-bf4c-0b0780f5c96f Ancestors: Collections-pre.791 Four times faster implementation of isAsciiString. =============== Diff against Collections-pre.791 =============== Item was changed: ----- Method: String>>isAsciiString (in category 'testing') ----- isAsciiString + "Answer if the receiver contains only ascii characters. + Inline ^self allSatisfy: [ :each | each asciiValue <= 127 ] for speed." + 1 to: self basicSize do: [:i| (self basicAt: i) > 127 ifTrue: [^false]]. + ^true! - - ^self allSatisfy: [ :each | each asciiValue <= 127 ]! |
Hi Eliot
> On 03.05.2018, at 09:56, [hidden email] wrote: > > Eliot Miranda uploaded a new version of Collections to project The Trunk: > http://source.squeak.org/trunk/Collections-eem.792.mcz > > ==================== Summary ==================== > > Name: Collections-eem.792 > Author: eem > Time: 3 May 2018, 12:55:52.175146 am > UUID: 7d8995ed-835e-44b0-bf4c-0b0780f5c96f > Ancestors: Collections-pre.791 > > Four times faster implementation of isAsciiString. > > =============== Diff against Collections-pre.791 =============== > > Item was changed: > ----- Method: String>>isAsciiString (in category 'testing') ----- > isAsciiString > + "Answer if the receiver contains only ascii characters. > + Inline ^self allSatisfy: [ :each | each asciiValue <= 127 ] for speed." > + 1 to: self basicSize do: [:i| (self basicAt: i) > 127 ifTrue: [^false]]. > + ^true! > - > - ^self allSatisfy: [ :each | each asciiValue <= 127 ]! > > Although I am in awe of the performance improvement, I am curious wether it really pays of to inline #allSatify: here, in terms of performance vs. readability. I presume that, given the current use of #isAsciiString, we can stay with the more compact, readable version that does not need the 'caveat lector'-comment at the beginning of the method. Best regards -Tobias |
+1. Not only that, why the call to #basicSize and #basicAt:. Really?
It's like... not even Smalltalk anymore... :( On Thu, May 3, 2018 at 3:41 AM, Tobias Pape <[hidden email]> wrote: > Hi Eliot > >> On 03.05.2018, at 09:56, [hidden email] wrote: >> >> Eliot Miranda uploaded a new version of Collections to project The Trunk: >> http://source.squeak.org/trunk/Collections-eem.792.mcz >> >> ==================== Summary ==================== >> >> Name: Collections-eem.792 >> Author: eem >> Time: 3 May 2018, 12:55:52.175146 am >> UUID: 7d8995ed-835e-44b0-bf4c-0b0780f5c96f >> Ancestors: Collections-pre.791 >> >> Four times faster implementation of isAsciiString. >> >> =============== Diff against Collections-pre.791 =============== >> >> Item was changed: >> ----- Method: String>>isAsciiString (in category 'testing') ----- >> isAsciiString >> + "Answer if the receiver contains only ascii characters. >> + Inline ^self allSatisfy: [ :each | each asciiValue <= 127 ] for speed." >> + 1 to: self basicSize do: [:i| (self basicAt: i) > 127 ifTrue: [^false]]. >> + ^true! >> - >> - ^self allSatisfy: [ :each | each asciiValue <= 127 ]! >> >> > > Although I am in awe of the performance improvement, I am curious wether it really pays of to inline #allSatify: here, in terms of performance vs. readability. > > I presume that, given the current use of #isAsciiString, we can stay with the more compact, readable version that does not need the 'caveat lector'-comment at the beginning of the method. > > > Best regards > -Tobias > > |
On Thu, May 3, 2018 at 10:12 AM, Chris Muller <[hidden email]> wrote: +1. Not only that, why the call to #basicSize and #basicAt:. Really? What? This is a workhorse method, and you want to create a block, and apply the block to each element, and convert each element from a character to an integer, when basicAt: yields (and has always yielded) integers in strings? You think that one line with a block and a high-level iterator is so much better than a two line with a simpler inner loop? | strings nothing allSatisfy basicAt | strings := String allSubInstances. strings := strings, strings, strings, strings, strings. nothing := [:s|]. allSatisfy := [:s| s isAsciiStringWithAllSatisfy]. basicAt := [:s| s isAsciiStringWithBasicAt]. Smalltalk garbageCollect. 1 to: 2 do: [:ign| times := { nothing. allSatisfy. basicAt } collect: [:block| [strings do: block] timeToRun]]. (times second - times first / (times third - times first) asFloat) roundTo: 0.1 Answers range from 5.1 to 5.7 on 64-bits Mac OS X. You really want that elegance for something that is 5 times slower but one line shorter? I did this because I saw Patrick inline isAsciiString in mail message processing, and I thought a) his method would read better if it used isAsciiString, and b) isAsciiString can be implemented much better without affecting clients, so I went ahead. Part of the beauty of Smalltalk is that clients are insulated from implementation so that implementations can be improved. Since when has that not been a part of Smalltalk?
_,,,^..^,,,_ best, Eliot |
On Thu, May 3, 2018 at 10:34 AM, Eliot Miranda <[hidden email]> wrote:
While I understand (and like) the idea of having nice, clean, elegant methods everywhere in Squeak, I also would like it to be super fast. I think Eliot's change is a nice example straddling that line - note the elegant way, and then show (and implement) a fast way to do the same thing still just using Smalltalk. Not to mention that he actually commented the method with what it should do. -cbc |
In reply to this post by Eliot Miranda-2
Hi Eliot
> On 03.05.2018, at 19:34, Eliot Miranda <[hidden email]> wrote: > > This is a workhorse method Just a question for clarification, what is a workhorse method? Best regards -Tobias |
Workhorse - meaning it takes on a lot of work, is heavily used. So, I looked around the image, and it is used by the clipboard AND it is also used in basically any type of file-out of code. On Thu, May 3, 2018 at 10:54 AM, Tobias Pape <[hidden email]> wrote: Hi Eliot |
In reply to this post by Tobias Pape
On Thu, May 3, 2018 at 10:54 AM, Tobias Pape <[hidden email]> wrote: Hi Eliot Workhorse, noun - a horse used for work on a farm. - a person or machine that dependably performs hard work over a long period of time: "he was a workhorse of an actor, often appearing in as many as forty plays in a year". _,,,^..^,,,_ best, Eliot |
In reply to this post by Chris Muller-3
Uh, I'm glad it wasn't me who rewrote this method. I would have used a
primitive instead to make it even more not-even-Smalltalk-anymore. If you don't like this kind of code, you'd better not look under the hood too much. Especially avoid sneaking a peek at Kernel and Collections if you can. :) On a serious note, libary methods, like that method, ought to be fast, because you can't know in what situation they will be used. Saying that it's not even used is therefore not a valid reason. Levente On Thu, 3 May 2018, Chris Muller wrote: > +1. Not only that, why the call to #basicSize and #basicAt:. Really? > > It's like... not even Smalltalk anymore... :( > > On Thu, May 3, 2018 at 3:41 AM, Tobias Pape <[hidden email]> wrote: >> Hi Eliot >> >>> On 03.05.2018, at 09:56, [hidden email] wrote: >>> >>> Eliot Miranda uploaded a new version of Collections to project The Trunk: >>> http://source.squeak.org/trunk/Collections-eem.792.mcz >>> >>> ==================== Summary ==================== >>> >>> Name: Collections-eem.792 >>> Author: eem >>> Time: 3 May 2018, 12:55:52.175146 am >>> UUID: 7d8995ed-835e-44b0-bf4c-0b0780f5c96f >>> Ancestors: Collections-pre.791 >>> >>> Four times faster implementation of isAsciiString. >>> >>> =============== Diff against Collections-pre.791 =============== >>> >>> Item was changed: >>> ----- Method: String>>isAsciiString (in category 'testing') ----- >>> isAsciiString >>> + "Answer if the receiver contains only ascii characters. >>> + Inline ^self allSatisfy: [ :each | each asciiValue <= 127 ] for speed." >>> + 1 to: self basicSize do: [:i| (self basicAt: i) > 127 ifTrue: [^false]]. >>> + ^true! >>> - >>> - ^self allSatisfy: [ :each | each asciiValue <= 127 ]! >>> >>> >> >> Although I am in awe of the performance improvement, I am curious wether it really pays of to inline #allSatify: here, in terms of performance vs. readability. >> >> I presume that, given the current use of #isAsciiString, we can stay with the more compact, readable version that does not need the 'caveat lector'-comment at the beginning of the method. >> >> >> Best regards >> -Tobias >> >> |
In reply to this post by Eliot Miranda-2
> On 03.05.2018, at 20:00, Eliot Miranda <[hidden email]> wrote: > > > > On Thu, May 3, 2018 at 10:54 AM, Tobias Pape <[hidden email]> wrote: > Hi Eliot > > On 03.05.2018, at 19:34, Eliot Miranda <[hidden email]> wrote: > > > > This is a workhorse method > > Just a question for clarification, what is a workhorse method? > > Workhorse, noun > - a horse used for work on a farm. > - a person or machine that dependably performs hard work over a long period of time: "he was a workhorse of an actor, often appearing in as many as forty plays in a year". > Ok, so I got that right. I saw it as exactly the opposite: only used rarely, and in non-performance-critical parts, such as Clipboard d o . O b I am puzzled. -t > > > _,,,^..^,,,_ > best, Eliot > |
In reply to this post by Eliot Miranda-2
Are we really arguing about the use of #basicAt: and #basicSize right now ???? Really ? Really ?!?!?!? These methods are there for one reason : speed. If we find them "confusing", we really got a problem! There are 127 senders of #basicAt: and 125 senders of #basicSize in the current Squeak 6.0 image... ----------------- Benoît St-Jean Yahoo! Messenger: bstjean Twitter: @BenLeChialeux Pinterest: benoitstjean Instagram: Chef_Benito IRC: lamneth Blogue: endormitoire.wordpress.com "A standpoint is an intellectual horizon of radius zero". (A. Einstein)
On Thursday, May 3, 2018, 1:34:17 p.m. EDT, Eliot Miranda <[hidden email]> wrote:
On Thu, May 3, 2018 at 10:12 AM, Chris Muller <[hidden email]> wrote: +1. Not only that, why the call to #basicSize and #basicAt:. Really? What? This is a workhorse method, and you want to create a block, and apply the block to each element, and convert each element from a character to an integer, when basicAt: yields (and has always yielded) integers in strings? You think that one line with a block and a high-level iterator is so much better than a two line with a simpler inner loop? | strings nothing allSatisfy basicAt | strings := String allSubInstances. strings := strings, strings, strings, strings, strings. nothing := [:s|]. allSatisfy := [:s| s isAsciiStringWithAllSatisfy]. basicAt := [:s| s isAsciiStringWithBasicAt]. Smalltalk garbageCollect. 1 to: 2 do: [:ign| times := { nothing. allSatisfy. basicAt } collect: [:block| [strings do: block] timeToRun]]. (times second - times first / (times third - times first) asFloat) roundTo: 0.1 Answers range from 5.1 to 5.7 on 64-bits Mac OS X. You really want that elegance for something that is 5 times slower but one line shorter? I did this because I saw Patrick inline isAsciiString in mail message processing, and I thought a) his method would read better if it used isAsciiString, and b) isAsciiString can be implemented much better without affecting clients, so I went ahead. Part of the beauty of Smalltalk is that clients are insulated from implementation so that implementations can be improved. Since when has that not been a part of Smalltalk?
_,,,^..^,,,_ best, Eliot |
In reply to this post by Eliot Miranda-2
> You really want that elegance for something that is 5 times slower but one
> line shorter? What I want is for Squeak to be a system that is inviting to both developers and users, but for that to work, the former have to make some minor sacrifices to accomodate the latter... > I did this because I saw Patrick inline isAsciiString in mail > message processing, ... and so sometimes it's best to leave the in-lining out in the application(s) that need it. As Tobias pointed out, the in-image uses WE have for this method are not performance critical, so if no one can notice a performance difference, then this change is effectively just a conversion of code in the class library from something that was developer AND user consumable, to one that is only developer consumable. > and I thought a) his method would read better if it used > isAsciiString, This is an argument for readability, but the audience of readers that will likely encounter this application-specific method is far fewer than the ones that the core class-library method. > and b) isAsciiString can be implemented much better without > affecting clients, so I went ahead. Part of the beauty of Smalltalk is that > clients are insulated from implementation so that implementations can be > improved. Since when has that not been a part of Smalltalk? "Insulated from implementation" is part of the beauty of modularity and object-oriented programming present in many languages. Smalltalk is the one that does that but with a simplicity and user-readable syntax. _____ We all like performance AND readability (and that's what I thought your dynamic-inlining would bring us -- the best of both worlds, did it not happen?). I'm all for in-lining the REAL workhorse methods (low-level) that truly get called thousands of times every minute, but for the others, I ask if you would please consider the Dynabook dream, where high-school kids are reading methods, assembling behaviors for their own needs, etc. They need speed AND readability. We need both. - Chris |
In reply to this post by cbc
I'm all for faster code in the standard library. One could leave the old version in the comment as an example for other ways of writing it. Best, Karl On Thu, May 3, 2018 at 7:51 PM, Chris Cunningham <[hidden email]> wrote:
|
On Thu, May 3, 2018 at 10:09 PM, karl ramberg <[hidden email]> wrote:
Which he already did. Why complain about it then :-( Best, Karl
|
+1 for leave change and close discussion.
All the best, Ron Teitelbaum |
In reply to this post by Karl Ramberg
Just for musing with stupid implementations: CharacterSet>>ascii "return a set containing just the ASCII characters" ^Ascii ifNil: [ Ascii := self newFrom: ((1 to: 127) collect: [:code | code asCharacter]) ] CharacterSet>>nonAscii ^NonAscii ifNil: [ NonAscii := self ascii complement ]. | bsa | bsa := ByteString allInstances. { [bsa do: [:e | e isAsciiString]] bench. [bsa do: [:e | (CharacterSet nonAscii findFirstInByteString: e startingAt: 1) = 0]] bench. }. #('2.88 per second. 348 milliseconds per run.' '36.1 per second. 27.7 milliseconds per run.') So we could do this for ByteString: ByteString>>isAsciiString ^(CharacterSet nonAscii findFirstInByteString: self startingAt: 1) = 0 But WideString requires another hack... 2018-05-03 22:11 GMT+02:00 karl ramberg <[hidden email]>:
|
In reply to this post by Squeak - Dev mailing list
Benoit St-Jean asked:
> Are we really arguing about the use of #basicAt: and > #basicSize right now ???? Really ? Really ?!?!?!? > > These methods are there for one reason : speed. If we > find them "confusing", we really got a problem! My impression had always been that these methods exist to work around the limitation of only having "self" and "super" when you need to strip away all polymorphism and somehow access "the base truth". That they are also a speedy short cut is a nice side effect. A more proper reflection mechanism would eliminate the first need for these methods while a more sophisticated compiler would eliminate the second need. Both of these were goals of the Self project. -- Jecel |
In reply to this post by Levente Uzonyi
> On a serious note, libary methods, like that method, ought to be fast,
Every method in the image is a library method, so what do you mean by "like" that method? Below, you said can't base on the situation(s) it's currently used, so are there any methods in the image which *shouldn't* be so desperately optimized? I'm just trying to figure out your basis for saying "ought" above, about fast but not readability. Are those not the personal priorities of an expert developer? What about the priorities of the students and users in the Dynabook dream? > because you can't know in what situation they will be used. Saying that it's > not even used is therefore not a valid reason. That's only true for the very lowest-level methods. Generally, it's better to design and code for what's known, and NOT try to code for the future / unknown. IF and when that future comes, it'd be easy enough for you to deal with any performance issues at THAT time. In the meantime, we "ought" :) to keep as much beautiful minimal elegance as we can. |
On Thu, 3 May 2018, Chris Muller wrote:
>> On a serious note, libary methods, like that method, ought to be fast, > > Every method in the image is a library method, so what do you mean by Really? Explain me how Browser >> #defaultBrowserTitle, a method I chose randomly, is a library method? > "like" that method? Below, you said can't base on the situation(s) > it's currently used, so are there any methods in the image which > *shouldn't* be so desperately optimized? I'm just trying to figure > out your basis for saying "ought" above, about fast but not > readability. Are those not the personal priorities of an expert > developer? What about the priorities of the students and users in the > Dynabook dream? Others have explained why the readability argument is not very convicing. About your other argument, I also find it rather weak. If Eliot's code is too complex for "users", then how could they cope with the complexity of #anySatisfy:? > >> because you can't know in what situation they will be used. Saying that it's >> not even used is therefore not a valid reason. > > That's only true for the very lowest-level methods. Generally, it's > better to design and code for what's known, and NOT try to code for > the future / unknown. IF and when that future comes, it'd be easy > enough for you to deal with any performance issues at THAT time. In > the meantime, we "ought" :) to keep as much beautiful minimal > elegance as we can. So, if I decided to write a mail transfer agent, and I found that optimizing #isAsciiString would bump the throughput by, let's say, 10%, then and only then would I be allowed to change the implementation in Squeak? Or should I keep my optimized version in my image, because it's not considered generally useful? Levente |
In reply to this post by Nicolas Cellier
> On 03.05.2018, at 22:48, Nicolas Cellier <[hidden email]> wrote: > > > But WideString requires another hack... Like ^false ? :D -t |
Free forum by Nabble | Edit this page |