Status: New
Owner: ----
New issue 3779 by
[hidden email]: isUnicodeStringWithCJK returns false
on a string containing Kanji
http://code.google.com/p/pharo/issues/detail?id=3779Pharo image: Pharo
Pharo core version: Pharo1.2rc2 build #12336
Virtual machine used: Pharo 1.2's One-click Cog
Steps to reproduce:
'In Japanese, Japanese is written 日本語' isUnicodeStringWithCJK. " returns
false "
'日本語' isUnicodeStringWithCJK. " also returns false "
I'm pretty sure it should return true, since the both strings contain Kanji.
Contrast the behavior with:
'In Japanese, Japanese is written 日本語' anySatisfy: [ :c | Unicode
isUnifiedKanji: c charCode ]. " returns true "
I think the cause is that #isUnicodeStringWithCJK calls both
#isUnifiedKanji: and #isTraditionalDomestic and, to me at least, it looks
like #isTraditionalDomestic is wrong.
My guess is that #isTraditionalDomestic did something sensible back in
Squeak during the transition to Unicode WideStrings, but now is simply
testing the wrong thing. Hard to tell not knowing the history though.
My sugestion is to rewrite #isUnicodeStringWithCJK purely in terms of
anySatisfy: [ :c | Unicode isUnifiedKanji: c charCode ]
unless someone can figure out when the system is using a EncodedCharSet
subclass that isn't Unicode. I, unfortunately, don't have any understanding
currently of where EncodedCharSet's fit into the system. Hopefully someone
who knows more about them then me can decide this straight away.