Hi all,
Now Squeak 4.1 has supported Chinese from Code level, I have tried to read a file containing Chinese character from attached file, it showed up ?? for chinese character.
Note: run
(MultiByteFileStream fileNamed: 'file.txt') contents in the workspace Any one know some about it? Thanks and Regards Eric Gao file.txt (16 bytes) Download Attachment |
On Thursday 27 Jan 2011 3:22:37 pm Eric wrote:
> Now Squeak 4.1 has supported Chinese from Code level, I have tried to read > a file containing Chinese character from attached file, it showed up ?? > for chinese character. > > Note: run > > (MultiByteFileStream fileNamed: 'file.txt') contents > in the workspace > > Any one know some about it? LanguageEnvironment defaultSystemConverter determines the text converter used for reading text files for your environment. Inspect (MultiByteFileStream fileNamed: 'file.txt') converter and make sure it is UTF8TextConverter. If not, you may need to override the converter before reading its contents. e.g. (MultiByteFileStream fileNamed: 'file.txt') converter: (UTF8TextConverter new); contents. If the characters are being decoded correctly then you may need specific language support (fonts, rendering engine etc.) for your dialect. See language support classes in Multilingual-Languages for more details. Subbu |
FYI
---------- Forwarded message ---------- From: Eric <[hidden email]> Date: 2011/1/28 Subject: Re: [squeak-dev] About Squeak 4.1 Language Support To: kksubbu.ml@gmail.com Thanks Subbu,
As you mentioned, I tried the CNGBTextConverter existing in Squeak4.1 and I assume that it will invoke the languageEnvironment to SimplifiedChineseEnvironment, am i right?
I tried following code like:
(MultiByteFileStream fileNamed: 'file.txt')
converter: (CNGBTextConverter new); contents. And it still doesn't work, does it mean i need to import some fonts for it or other ways?
Also another question is in Squeak4.1 what could i do to make sure the the language environment I use is chinese or some others?
My understanding now in Squeak4.1 that the default language is UTF8 or Unicode, where could i do the switch?
I am just a beginner on Squeak Smalltalk, please correct me if I make any mistakes on understanding
Eric
2011/1/27 K. K. Subramaniam <kksubbu.ml@gmail.com>
|
In reply to this post by K K Subbu
On Friday 28 Jan 2011 10:14:43 am Eric wrote:
> And it still doesn't work, does it mean i need to import some fonts for it > or other ways? Were you able to inspect the decoded character values? If the decoding is right then the issue is with fonts. See isFontAvailable and installFont methods in LanguageEnvironment to see how to load language-specific fonts on startup. GreekEnvironment and JapaneseEnvironment classes depend on such external font files. > Also another question is in Squeak4.1 what could i do to make sure the the > language environment I use is chinese or some others? What is your locale setting? Inspect: Locale current localeID SimplifiedChineseEnvironment>>supportedLanguages contains 'zh' > My understanding now in Squeak4.1 that the default language is UTF8 or > Unicode, where could i do the switch? Unfortunately not, because of the need for backward compability across multiple OSes and platforms. To select the right converter, use class side methods in SimplifiedChineseEnvironment. See JapaneseEnvironment for help. > I am just a beginner on Squeak Smalltalk, please correct me if I make any > mistakes on understanding Generally, requests to this mailing list receive brief responses. If you need more detail, you could try the beginners mailing list. Squeak needs people like to you to increase support for Chinese locales. Subbu |
The decoding is correct, for example, a chinese character ‘中', the UTF8 decoding is 25185837, but the dispaly in the workspace is ??,
I inpsect on :
(SimplifiedChineseEnvironment new) isFontAvailable,
it return false to me.
and i run following to installFont:
(Locale isoLanguage: 'zh') languageEnvironment fontDownload.
seems not font support on the net, any suggestion on how could i got the font for chinese?
And also i have read the chinese support codes developed by Yo, it does work on Squeak 3,8 and 3,9, but it doesn't work on 4.1, I don't understanding what the different feature on Squeak4.1 on Language support side?
Subbu |
Find another strange thing
inspect "TextStyle defaultFont", in Squeak 3 (3.8,3.9,3.10), it was display as StrikeFontSet, but in Squeak 4, it was display as StrikeFont.
Also for the inputIntepreter, it work in 3.8, 3.9, but doesn't work in 3.10, any suggestion on how to debug such kind of method?
For InputIntepreter
'From Squeak3.10.2 of ''5 June 2008'' [latest update: #7179] on 30 January 2011 at 4:58:28 pm'! KeyboardInputInterpreter subclass: #WinGB2312InputInterpreter instanceVariableNames: 'converter' classVariableNames: '' poolDictionaries: '' category: 'Multilingual-TextConversion'! !WinGB2312InputInterpreter methodsFor: 'all' stamp: '1 1/30/2011 16:55'!
initialize converter := CNGBTextConverter new.
! ! !WinGB2312InputInterpreter methodsFor: 'all' stamp: 'yo 6/20/2006 11:11'!
nextCharFrom: sensor firstEvt: evtBuf | firstCharacter secondCharacter peekEvent char1Value keyValue pressType type stream multiCharacter |
keyValue := evtBuf third. pressType := evtBuf fourth. pressType = EventKeyDown ifTrue: [type := #keyDown]. pressType = EventKeyUp ifTrue: [type := #keyUp]. pressType = EventKeyChar ifTrue: [type := #keystroke]. char1Value := (Character value: keyValue) macToSqueak asciiValue.
(char1Value > 16rA0 and: [char1Value < 16rF8]) ifFalse: [ ^ keyValue asCharacter. ]. peekEvent := sensor peekEvent.
peekEvent ifNotNil: [ pressType := peekEvent fourth. ]. "peekEvent printString displayAt: [hidden email]." (peekEvent notNil and: [(peekEvent at: 4) = EventKeyDown]) ifTrue: [sensor nextEvent. peekEvent := sensor peekEvent. peekEvent ifNotNil: [ pressType := peekEvent fourth. ]. ]. (type = #keystroke and: [peekEvent notNil and: [(peekEvent at: 1) = EventTypeKeyboard and: [(peekEvent at: 4) = EventKeyChar]]]) ifTrue: [ firstCharacter := char1Value asCharacter. secondCharacter := (peekEvent at: 3) asCharacter macToSqueak. stream := ReadStream on: (String with: firstCharacter with: secondCharacter). multiCharacter := converter nextFromStream: stream. multiCharacter isCharacter ifTrue: [ multiCharacter isOctetCharacter ifFalse: [ sensor nextEvent ]. ]. ^ multiCharacter. ]. ^ keyValue asCharacter. ! ! 在 2011年1月30日 下午2:14,Eric <[hidden email]>写道:
|
In reply to this post by ericgao
On Sunday 30 Jan 2011 11:44:16 am Eric wrote:
> and i run following to installFont: > (Locale isoLanguage: 'zh') languageEnvironment fontDownload. > seems not font support on the net, any suggestion on how could i got the > font for chinese? You need to leave a FontSimplifiedChineseEnvironment.sar in the default directory for Squeak to load it automatically. Have you seen http://code.google.com/p/chinesesqueak/ ? I can't read Chinese and have no idea of what those forums contain :-( but I did see some posters including the SAR file for Chinese support. HTH .. Subbu |
yes, i have downloaded the codes from
svn checkout http://chinesesqueak.googlecode.com/svn/
regarding the chinese character, the code is working for Squeak 3.8, and 3.9
but using these codes, 3.10 has problem on typed in chinese character.....
Also for Squeak 4.1 it seems it has been changed much from based, am I right?
2011/1/30 K. K. Subramaniam <kksubbu.ml@gmail.com>
|
On Sunday 30 Jan 2011 10:23:46 pm Eric wrote:
> yes, i have downloaded the codes from > svn checkout *http <http://chinesesqueak.googlecode.com/svn/>*:// > chinesesqueak.googlecode.com/svn/ > regarding the chinese character, the code is working for Squeak 3.8, and > 3.9 but using these codes, 3.10 has problem on typed in chinese > character..... This is because of input method conflict between supporting legacy codes and modern UTF-8 input codes across multiple platforms. See SQ-554 and its discussion in etoys-dev for the background. You can adapt the patch to make it work for your case (Squeak 4 on Windows?). Subbu |
Thanks Subbu,
>This is because of input method conflict between supporting legacy codes and
modern UTF-8 input codes across multiple platforms. Don't very understand about the conflict you mentioned, could you give some example?
>See SQ-554 and its discussion in etoys-dev for the background. You can adapt
the patch to make it work for your case (Squeak 4 on Windows?). Also for SQ-554, which defect report URL for this item? Do you mean the patch in etoys-dev?
Eric |
On Tuesday 01 Feb 2011 3:40:10 pm Eric wrote:
> Thanks Subbu, > > >This is because of input method conflict between supporting legacy codes > > and modern UTF-8 input codes across multiple platforms. > Don't very understand about the conflict you mentioned, could you give > some example? See the threads http://lists.squeakland.org/pipermail/etoys-dev/2010-October/005933.html http://forum.world.st/New-Win32-VM-m17n-testers-needed-td63730.html In a multilingual setting, the language of input codes can be different from what is implied by the locale setting (system language). So one can encounter Chinese input in a Latin-1 locale. The image has methods to switch locales manually but not converters :-(. The logic for arriving at language+encoding is different on different platforms. Part of this logic lies in VM and part of it in the VM. VMs ports on Windows, Mac and Linux use different logic to deal with multi-byte input methods, so the code in the image has to dance around with VM host, locale setting and language codes to collect input. If you are not likely to read files encoded in pre-Unicode Chinese codes and you are just interested in getting Squeak to deal with Chinese input on Wintel VM, just apply a patch to override all this logic for now. Hopefully, better code will emerge as we get more feedback from Chinese users. > >See SQ-554 and its discussion in etoys-dev for the background. You can > > adapt the patch to make it work for your case (Squeak 4 on Windows?). > Also for SQ-554, which defect report URL for this item? Do you mean the > patch in etoys-dev? Yes. In my patch, I tried to get UTF-8 encoding thru the VM into Squeak where I could decode uniformly. It worked on Linux VM but not on Win32. I don't have Win32 development machines around, so I couldn't take it further. Subbu |
Free forum by Nabble | Edit this page |