Issue 3779 in pharo: isUnicodeStringWithCJK returns false on a string containing Kanji

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue 3779 in pharo: isUnicodeStringWithCJK returns false on a string containing Kanji

pharo
Status: New
Owner: ----

New issue 3779 by [hidden email]: isUnicodeStringWithCJK returns false  
on a string containing Kanji
http://code.google.com/p/pharo/issues/detail?id=3779

Pharo image: Pharo
Pharo core version: Pharo1.2rc2 build #12336
Virtual machine used: Pharo 1.2's One-click Cog

Steps to reproduce:

'In Japanese, Japanese is written 日本語' isUnicodeStringWithCJK. " returns  
false "
'日本語' isUnicodeStringWithCJK. " also returns false "

I'm pretty sure it should return true, since the both strings contain Kanji.

Contrast the behavior with:

'In Japanese, Japanese is written 日本語' anySatisfy: [ :c | Unicode  
isUnifiedKanji: c charCode ]. " returns true "

I think the cause is that #isUnicodeStringWithCJK calls both  
#isUnifiedKanji: and #isTraditionalDomestic and, to me at least, it looks  
like #isTraditionalDomestic is wrong.

My guess is that #isTraditionalDomestic did something sensible back in  
Squeak during the transition to Unicode WideStrings, but now is simply  
testing the wrong thing. Hard to tell not knowing the history though.

My sugestion is to rewrite #isUnicodeStringWithCJK purely in terms of

     anySatisfy: [ :c | Unicode isUnifiedKanji: c charCode ]

unless someone can figure out when the system is using a EncodedCharSet  
subclass that isn't Unicode. I, unfortunately, don't have any understanding  
currently of where EncodedCharSet's fit into the system. Hopefully someone  
who knows more about them then me can decide this straight away.


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3779 in pharo: isUnicodeStringWithCJK returns false on a string containing Kanji

pharo

Comment #1 on issue 3779 by [hidden email]: isUnicodeStringWithCJK  
returns false on a string containing Kanji
http://code.google.com/p/pharo/issues/detail?id=3779

The comment says #isTraditionalDomestic is only for backward compatibility  
in Squeak context (loading old projects...).

I think you are right, in Pharo this should just be removed.


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3779 in pharo: isUnicodeStringWithCJK returns false on a string containing Kanji

pharo
Updates:
        Status: FixProposed
        Labels: Milestone-1.3

Comment #2 on issue 3779 by [hidden email]: isUnicodeStringWithCJK  
returns false on a string containing Kanji
http://code.google.com/p/pharo/issues/detail?id=3779

fix attached

Attachments:
        Fix.1.cs  2.6 KB


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3779 in pharo: isUnicodeStringWithCJK returns false on a string containing Kanji

pharo
Updates:
        Status: closed

Comment #3 on issue 3779 by [hidden email]: isUnicodeStringWithCJK  
returns false on a string containing Kanji
http://code.google.com/p/pharo/issues/detail?id=3779

in 13116