Login  Register

Re: The Trunk: Collections-eem.792.mcz

Posted by Nicolas Cellier on May 03, 2018; 8:48pm
URL: https://forum.world.st/The-Trunk-Collections-eem-792-mcz-tp5075599p5075725.html

Just for musing with stupid implementations:

CharacterSet>>ascii
    "return a set containing just the ASCII characters"
    ^Ascii ifNil: [ Ascii := self newFrom: ((1 to: 127) collect: [:code | code asCharacter]) ]
CharacterSet>>nonAscii
    ^NonAscii ifNil: [ NonAscii := self ascii complement ].

| bsa |
bsa := ByteString allInstances.
{
[bsa do: [:e | e isAsciiString]] bench.
[bsa do: [:e | (CharacterSet nonAscii findFirstInByteString: e startingAt: 1) = 0]] bench.
}.
 #('2.88 per second. 348 milliseconds per run.' '36.1 per second. 27.7 milliseconds per run.')

So we could do this for ByteString:

ByteString>>isAsciiString
    ^(CharacterSet nonAscii findFirstInByteString: self startingAt: 1) = 0

But WideString requires another hack...



2018-05-03 22:11 GMT+02:00 karl ramberg <[hidden email]>:


On Thu, May 3, 2018 at 10:09 PM, karl ramberg <[hidden email]> wrote:
I'm all for faster code in the standard library.
One could leave the old version in the comment as an example for other ways of writing it.

Which he already did.
Why complain about it then :-(

Best,
Karl

 
Best,
Karl


On Thu, May 3, 2018 at 7:51 PM, Chris Cunningham <[hidden email]> wrote:


On Thu, May 3, 2018 at 10:34 AM, Eliot Miranda <[hidden email]> wrote:


On Thu, May 3, 2018 at 10:12 AM, Chris Muller <[hidden email]> wrote:
+1.  Not only that, why the call to #basicSize and #basicAt:.  Really?

It's like...  not even Smalltalk anymore...   :(

What?  This is a workhorse method, and you want to create a block, and apply the block to each element, and convert each element from a character to an integer, when basicAt: yields (and has always yielded) integers in strings?  You think that one line with a block and a high-level iterator is so much better than a two line with a simpler inner loop?
 
| strings nothing allSatisfy basicAt |
strings := String allSubInstances.
strings := strings, strings, strings, strings, strings.
nothing := [:s|].
allSatisfy := [:s| s isAsciiStringWithAllSatisfy].
basicAt := [:s| s isAsciiStringWithBasicAt].
Smalltalk garbageCollect.
1 to: 2 do:
[:ign| times := { nothing. allSatisfy. basicAt } collect: [:block| [strings do: block] timeToRun]].
(times second - times first / (times third - times first) asFloat) roundTo: 0.1

Answers range from 5.1 to 5.7 on 64-bits Mac OS X.

You really want that elegance for something that is 5 times slower but one line shorter?  I did this because I saw Patrick inline isAsciiString in mail message processing, and I thought a) his method would read better if it used isAsciiString, and b) isAsciiString can be implemented much better without affecting clients, so I went ahead.  Part of the beauty of Smalltalk is that clients are insulated from implementation so that implementations can be improved.  Since when has that not been a part of Smalltalk?

While I understand (and like) the idea of having nice, clean, elegant methods everywhere in Squeak, I also would like it to be super fast.  I think Eliot's change is a nice example straddling that line - note the elegant way, and then show (and implement) a fast way to do the same thing still just using Smalltalk.

Not to mention that he actually commented the method with what it should do.

-cbc