see issues 17302/17242/17227 String>>findString:startindAt:caseSensitive appears to be failing for extended charsets String>>compare:caseSensitive seems to be failing for extended charset comparisons String>>beginsWithEmpty:caseSensitive: has test failure for some cases CaseInsensitiveOrder := AsciiOrder copy. (0 to: 255) do:[ :v | | char upper | char := v asCharacter. upper := char asUppercase. upper isOctetCharacter ifFalse: [ upper := char ]. CaseInsensitiveOrder at: char asciiValue + 1 put: (CaseInsensitiveOrder at: upper asciiValue + 1) ]. |
Hallo Nicolai,
> On 06 Jan 2016, at 09:58, Nicolai Hess <[hidden email]> wrote: > > > see issues 17302/17242/17227 > String>>findString:startindAt:caseSensitive appears to be failing for extended charsets > String>>compare:caseSensitive seems to be failing for extended charset comparisons > String>>beginsWithEmpty:caseSensitive: has test failure for some cases > > the problem is, the standard character set used for building the CaseInsensitiveOrder map > only maps characters from the set of ascii characters but it is used in the findString/compare/beginsWith-methods for all byte characters. > > Any objections if we fill this map like it is suggested in case 17242 ? > > CaseInsensitiveOrder := AsciiOrder copy. > (0 to: 255) do:[ :v | > | char upper | > char := v asCharacter. > upper := char asUppercase. > upper isOctetCharacter > ifFalse: [ upper := char ]. > CaseInsensitiveOrder at: char asciiValue + 1 put: (CaseInsensitiveOrder at: upper asciiValue + 1) ]. > > (the check for #isOctectCharacter is needed because for some entries the correspondending > uppercase character is not within this character set). > > This would solve all three issues. > > > nicolai That looks like a beautiful fix that makes perfect sense. If all tests are green, I see no reason not to do it. Thanks and well done (again), Sven |
> On 06 Jan 2016, at 10:09 , Sven Van Caekenberghe <[hidden email]> wrote: > > Hallo Nicolai, > >> On 06 Jan 2016, at 09:58, Nicolai Hess <[hidden email]> wrote: >> >> >> see issues 17302/17242/17227 >> String>>findString:startindAt:caseSensitive appears to be failing for extended charsets >> String>>compare:caseSensitive seems to be failing for extended charset comparisons >> String>>beginsWithEmpty:caseSensitive: has test failure for some cases >> >> the problem is, the standard character set used for building the CaseInsensitiveOrder map >> only maps characters from the set of ascii characters but it is used in the findString/compare/beginsWith-methods for all byte characters. >> >> Any objections if we fill this map like it is suggested in case 17242 ? >> >> CaseInsensitiveOrder := AsciiOrder copy. >> (0 to: 255) do:[ :v | >> | char upper | >> char := v asCharacter. >> upper := char asUppercase. >> upper isOctetCharacter >> ifFalse: [ upper := char ]. >> CaseInsensitiveOrder at: char asciiValue + 1 put: (CaseInsensitiveOrder at: upper asciiValue + 1) ]. >> >> (the check for #isOctectCharacter is needed because for some entries the correspondending >> uppercase character is not within this character set). >> >> This would solve all three issues. >> >> >> nicolai > > That looks like a beautiful fix that makes perfect sense. > If all tests are green, I see no reason not to do it. > > Thanks and well done (again), > > Sven > > I was about to suggest copying the CaseSensitiveOrder mapping instead of the AsciiOrder, since it has an ordering more refined than just A-Z, but that would quickly lead to wanting to extend it to a generic Latin1 sort order rather than just ASCII (é between e and f, for example), which is a can of worms that is hard to solve without making the ordering locale specific... I mean, one could use the default Unicode ordering, but would inevitably receive complaints from, say, Norwegians, that å sorts between a and b instead of after z. After all, it only affects the case where compare is used for ordering anyways. Cheers, Henry signature.asc (859 bytes) Download Attachment |
In reply to this post by Sven Van Caekenberghe-2
2016-01-06 10:09 GMT+01:00 Sven Van Caekenberghe <[hidden email]>: Hallo Nicolai, Thanks for you feedback. Thanks and well done (again), :)
|
In reply to this post by Henrik Sperre Johansen
2016-01-06 11:09 GMT+01:00 Henrik Johansen <[hidden email]>:
Interesting, good idea, there are no uppercase characters without lowercases.
I didn't thought about ordering..., and I think I don't want to :)
|
In reply to this post by Nicolai Hess-3-2
It is cool to have you three on board. I love your discussions because
I'm learning by immersion. Le 6/1/16 09:58, Nicolai Hess a écrit : > > see issues 17302/17242/17227 > String>>findString:startindAt:caseSensitive appears to be failing for > extended charsets > String>>compare:caseSensitive seems to be failing for extended charset > comparisons > String>>beginsWithEmpty:caseSensitive: has test failure for some cases > > the problem is, the standard character set used for building the > CaseInsensitiveOrder map > only maps characters from the set of ascii characters but it is used > in the findString/compare/beginsWith-methods for all byte characters. > > Any objections if we fill this map like it is suggested in case 17242 ? > > CaseInsensitiveOrder := AsciiOrder copy. > (0 to: 255) do:[ :v | > | char upper | > char := v asCharacter. > upper := char asUppercase. > upper isOctetCharacter > ifFalse: [ upper := char ]. > CaseInsensitiveOrder at: char asciiValue + 1 put: > (CaseInsensitiveOrder at: upper asciiValue + 1) ]. > > (the check for #isOctectCharacter is needed because for some entries > the correspondending > uppercase character is not within this character set). > > This would solve all three issues. > > > nicolai |
Free forum by Nabble | Edit this page |