Hi Phillipe,
On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall <[hidden email]> wrote: Hi Why not rewrite String>>= aString "Answer whether the receiver sorts equally as aString.
The collation order is simple ascii (with case differences)." aString isString ifFalse: [ ^ false ].
^ (self compare: self with: aString collated: AsciiOrder) = 2 as String>>= aString "Answer whether the receiver sorts equally as aString.
The collation order is simple ascii (with case differences)." (aString isString and: [self size = aString size]) ifFalse: [^false]. ^ (self compare: self withSize: with: aString collated: AsciiOrder) = 2 ? -- best, Eliot
|
Quoting Eliot Miranda <[hidden email]>:
BTW, any good reason for not prefixing all the implementors of #= with this? Cheers,
Juan Vuletich |
On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists) <[hidden email]> wrote: --
This makes a huge difference, over 3 times faster: | bs t1 t2 | bs := ByteString allInstances first: 10000.
t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
{ t1. t2 } #(13726 4467) 4467 - 13726 / 137.26 -67.46%
It doesn't make much difference: | bs t1 t2 | bs := ByteString allInstances first: 10000.
t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. { t1. t2 } #(4628 4560)
4560 - 4628 / 46.28 -1.47% So is it worth it? If you feel it is I've no objection other than it feels a little kludgey for such little benefit. And there are the Symbols if one needs quick comparison and can bear the cost of slow interning.
best, Eliot
|
> On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists) <[hidden email]> wrote: >> >> Quoting Eliot Miranda <[hidden email]>: >> >> Hi Phillipe, >> >> >> On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall <[hidden email]> wrote: >>> >>> Hi >>> >>> I have been investigating why Dictionary look up performance with String keys is not as good as I would expected. Something I noted is that String >> #= is implemented in terms of #compare:with:collated:. There is no short circuit if Strings are not the same size. In my case some Strings have the same prefix but a different length eg 'Content-Type' and 'Content-Length'. In that case a #compare:with:collated: is performed even though we know in advance the answer will be false because they have different sizes. >> >> >> Why not rewrite >> >> String>>= aString >> "Answer whether the receiver sorts equally as aString. >> The collation order is simple ascii (with case differences)." >> aString isString ifFalse: [ ^ false ]. >> ^ (self compare: self with: aString collated: AsciiOrder) = 2 >> >> as >> >> String>>= aString >> "Answer whether the receiver sorts equally as aString. >> The collation order is simple ascii (with case differences)." >> >> (aString isString >> and: [self size = aString size]) ifFalse: [^false]. >> ^ (self compare: self withSize: with: aString collated: AsciiOrder) = 2 >> >> ? > > > This makes a huge difference, over 3 times faster: > > | bs t1 t2 | > bs := ByteString allInstances first: 10000. > t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. > (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. > t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. > { t1. t2 } #(13726 4467) > 4467 - 13726 / 137.26 -67.46% >> >> >> One /could/ add a replacement compare:with:collated: primitive primitiveCompareString which took the sizes as arguments to avoid asking twice. But it wouldn't be safe. One could abuse the primitive and lie about the size. So I suspect it is best to add the size check to String>>#= and accept the duplication of the primitive finding the sizes of the two strings. The cost in the primitive is minimal. A WideString version of the primitive might pay its way, but if Spur and Sista arrive soon the primitive shouldn't be faster than the optimised Smalltalk code. >> -- >> best, >> Eliot >> >> BTW, any good reason for not prefixing all the implementors of #= with this? >> >> "Any object is equal to itself" >> self == argument ifTrue: [ ^ true ]. > > > It doesn't make much difference: > > | bs t1 t2 | > bs := ByteString allInstances first: 10000. > t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. > (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. > t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. > { t1. t2 } #(4628 4560) > > 4560 - 4628 / 46.28 -1.47% > > So is it worth it? If you feel it is I've no objection other than it feels a little kludgey for such little benefit. That is a very common convention in the image, not kludgey. String>>#= should advertise, by its implementation, its desire and intent for maximum performance. Not covering this check conveys a lack of thoroghness and/or dedication to performance. Future readers of the method will wonder why such a simple avoidance of unnecessary processing for that case wasn't taken. By the time they wonder whether it was an oversight they've already expended too much thought about it. We should really consider Andres' question before popping this into trunk though.. > And there are the Symbols if one needs quick comparison and can bear the cost of slow interning. > -- > best, > Eliot > |
Free forum by Nabble | Edit this page |