String>>match issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

String>>match issue

vmusulainen-2
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: String>>match issue

Ben Coman
vmusulainen wrote:

> Hi!
>
> >From comment to #match: method
>
> match: text
> "Answer whether text matches the pattern in this string.
> Matching ignores upper/lower case differences.
>
> Check it now:
> 1. 'V' match: 'v'  -> true "Ok, It's fine"
> 2. 'Ш' match: 'ш' -> false "Use non-English (Cyrillic) letters Ups-s"
>
> -regards
> Vladimir Musulainen
>
>
>
> --
> View this message in context: http://forum.world.st/String-match-issue-tp4748497.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>
>
>  

If you debug ('Ш' match: 'ш') and trace through to
WideString>>findSubstring:in:startingAt:matchTable:
you will find that (c1 asciiValue) --> 1096
but (matchTable size) --> 256
so the comparison value defaults to (c1 asciiValue + 1) since the
character you are comparing is not in the matchTable.
(c2 asciiValue) --> 1064

So for proof of concept change this...
String>>initialize
    CaseInsensitiveOrder := (Array new: 2000) fillFrom: AsciiOrder with:
#value.   "<--MODIFIED"
    ($a to: $z) do:
        [:c | CaseInsensitiveOrder at: c asciiValue + 1
                put: (CaseInsensitiveOrder at: c asUppercase asciiValue
+1)].
    CaseInsensitiveOrder at: 1096+1 put:1096.    "<--ADDED"
    CaseInsensitiveOrder at: 1064+1 put:1096.   "<--ADDED"

then in Workspace evaluate "String initialize"
and now ('Ш' match: 'ш') --> true.

Now I'm not sure the best way to handle that long term.

btw, you may be tempted to use ('Ш' asciiValue) in place of 1096 in the
code, but maybe(I'm not sure) there is a problem saving an image
containing Unicode characters.

Maybe String's class variables CaseInsensitiveOrder & CaseSensitiveOrder
would be better handled as individual classes
to provide flexibility for other sort orders like
CaseInsensitiveGermanPhonebook [1] and probably
String>>findString:startingAt:caseSensitive: should double-dispatch
and be overriden by WideString.

cheers -ben

[1] http://userguide.icu-project.org/collation
[2] http://www.w3.org/International/wiki/Case_folding
[3] http://cldr.unicode.org/index/cldr-spec/collation-guidelines