Hi Squeakers,
I already extended String with TwoByteString and did a "scaling" with auto conversion to wider string when a wider character is put into a string. So far so good and this already works in Aida/Web. I also did a bit better UTF8 conversion but is only 25-80% faster that existing one in UTF8TextConverter. To prepare for even better results, I made a benchmark, which measure conversion time for English, French, Slovenian, Russian and Chinese 2500 characters long text. It measure 100 conversions which accumulates to 250K characters of text. Here are results in VW, Squeak with old UTF8 converter and a new one: VW old new english 30 313 248 ByteString, pure ASCII french 32 323 251 ByteString, ISO8859-1 (Latin 1) slovenian 48 578 480 TwoByteString Latin 2 russian 112 1306 720 TwoByteString Cyrillic chinese 107 1544 3825 TwoByteString Notice an exceptional 10x VW performance comparing to Squeak, and they do all encodings in plain Smalltalk! No primitives! So how come that Squeak is so slow here? Above benchmark was done on Squeak 3.9 on Suse Linux 10.1, P3.2GHz. Best regards Janko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
With corrected table of results:
> VW old new > english 30 313 248 ByteString, pure ASCII > french 32 323 251 ByteString, ISO8859-1 (Latin 1) > slovenian 48 578 480 TwoByteString Latin 2 > russian 112 1306 720 TwoByteString Cyrillic > chinese 107 1544 3825 TwoByteString > |
In reply to this post by Janko Mivšek
Hi, Janko,
> I also did a bit better UTF8 conversion but is only 25-80% faster that > existing one in UTF8TextConverter. Good! > Here are results in VW, Squeak with old UTF8 converter and a new one: > > VW old new > english 30 313 248 ByteString, pure ASCII > french 32 323 251 ByteString, ISO8859-1 (Latin 1) > slovenian 48 578 480 TwoByteString Latin 2 > russian 112 1306 720 TwoByteString Cyrillic > chinese 107 1544 3825 TwoByteString > > Notice an exceptional 10x VW performance comparing to Squeak, and they > do all encodings in plain Smalltalk! No primitives! So how come that > Squeak is so slow here? Is it true that you traded the performance for Chinese with other languages? BTW, I can't see the difference between this and your "With corrected table of results:". - UTF8TextConverter wasn't written with performance in mind (as you can tell^^;) - This kind of tight loop gives 3-5 factor of performance difference in VW and Squeak, plus, - immediate representation for characters must be helping a lot. For the OLPC, I think I will end up with writing primitives for Squeak. One could say that I should like the iconv library, but not sure if that is a good idea or not... -- Yoshiki |
Hi Yoshiki,
Yoshiki Ohshima wrote: > >> I also did a bit better UTF8 conversion but is only 25-80% faster that >> existing one in UTF8TextConverter. > > Good! > >> Here are results in VW, Squeak with old UTF8 converter and a new one: >> >> VW old new >> english 30 313 248 ByteString, pure ASCII >> french 32 323 251 ByteString, ISO8859-1 (Latin 1) >> slovenian 48 578 480 TwoByteString Latin 2 >> russian 112 1306 720 TwoByteString Cyrillic >> chinese 107 1544 3825 TwoByteString >> >> Notice an exceptional 10x VW performance comparing to Squeak, and they >> do all encodings in plain Smalltalk! No primitives! So how come that >> Squeak is so slow here? > > Is it true that you traded the performance for > Chinese with other languages? Definitively not, and I just don't understand why Chinese is so slow. I hope you'll be able too look at that code to see, what's wrong. And Chinese is close to Japanese, right? I learned Chinese a bit 20 years ago, but this was not of much help - I forgot too much :) I'll prepare and publish code and benchmark tomorrow. > BTW, I can't see the difference between this and your "With > corrected table of results:". The "corrected" should be "with corrected layout", just that. Sorry for that ambiguity. > > - UTF8TextConverter wasn't written with performance in mind (as you > can tell^^;) > - This kind of tight loop gives 3-5 factor of performance difference > in VW and Squeak, plus, > - immediate representation for characters must be helping a lot. > > For the OLPC, I think I will end up with writing primitives for > Squeak. One could say that I should like the iconv library, but not > sure if that is a good idea or not... > > -- Yoshiki > > -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
Hi,
On Tue, 12 Jun 2007 00:28:24 +0200, you wrote: >Definitively not, and I just don't understand why Chinese is so slow. I >hope you'll be able too look at that code to see, what's wrong. And >Chinese is close to Japanese, right? I learned Chinese a bit 20 years >ago, but this was not of much help - I forgot too much :) > >I'll prepare and publish code and benchmark tomorrow. Interesting to hear that unicode encoding is possible upon Squeak. But the new VM/Image that I installed and run under Windows XP box showed strange characters for the title and any Chinese characters entered. There is still much conversion done to overcome the anomalies, I think. Maybe someone had more converted and stable image for testing ... :-) . Best regards. tgkuo |
Hello,
> Interesting to hear that unicode encoding is possible upon Squeak. Well, it's been quite long time. > But the new VM/Image that I installed and run under Windows XP box > showed strange characters for the title and any Chinese characters > entered. There is still much conversion done to overcome the > anomalies, I think. Maybe someone had more converted and stable image > for testing ... :-) . Can you be a bit more specific? Which "the new VM/Image" you tried? BTW, have you looked at: http://www.smalltalk.org.cn/squeakr/squeakdownload0.html ? -- Yoshiki |
Hi,
On Mon, 11 Jun 2007 20:36:27 -0700, you wrote: > BTW, have you looked at: > >http://www.smalltalk.org.cn/squeakr/squeakdownload0.html > It's a simplified-Chinese based edition, not usable at our country which is a traditional-Chinese big-5 encoding world. anyway, thanks. tgkuo |
In reply to this post by tgkuo
Hi tgkuo,
Here is a proof that Unicode is definitively possible in Squeak. I put on a same web page some English, French, Slovenian, Russian and Chinese: http://mivsek.eranova.si:8888/ This is done by help of Aida/Web on Squeak 3.9. This image has Unicode patch installed but it would probably work the same on plan 3.9 too. Best regards Janko tgkuo wrote: > Hi, > On Tue, 12 Jun 2007 00:28:24 +0200, you wrote: > >> Definitively not, and I just don't understand why Chinese is so slow. I >> hope you'll be able too look at that code to see, what's wrong. And >> Chinese is close to Japanese, right? I learned Chinese a bit 20 years >> ago, but this was not of much help - I forgot too much :) >> >> I'll prepare and publish code and benchmark tomorrow. > > Interesting to hear that unicode encoding is possible upon Squeak. > But the new VM/Image that I installed and run under Windows XP box > showed strange characters for the title and any Chinese characters > entered. There is still much conversion done to overcome the > anomalies, I think. Maybe someone had more converted and stable image > for testing ... :-) . > > Best regards. > tgkuo -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
> This is done by help of Aida/Web on Squeak 3.9. This image has Unicode
> patch installed but it would probably work the same on plan 3.9 too. It did already in 3.8 as tested and demonstrated in the comments of the following blog post. The comments were written in a 3.8 image, but nowadays this applications runs on 3.9: http://www.lukas-renggli.ch/blog/studenckifestwal Lukas -- Lukas Renggli http://www.lukas-renggli.ch |
In reply to this post by Janko Mivšek
Janko,
> Here is a proof that Unicode is definitively possible in Squeak. I put > on a same web page some English, French, Slovenian, Russian and Chinese: > > http://mivsek.eranova.si:8888/ > > This is done by help of Aida/Web on Squeak 3.9. This image has Unicode > patch installed but it would probably work the same on plan 3.9 too. This could be a hunch and could be completely wrong, but have you tried to *use* Slovenian in Squeak, in the sense that type in from keyboard, display in workspace, and use it in class names/variable names, etc.? To try it, take the OLPC image for example, gunzip the attached latin2.out, and evaluate the following in a workspace: StrikeFontSet installExternalFontFileName6: 'latin2.out' encoding: Latin2Environment leadingChar encodingName: #Latin2 textStyleName: #DefaultMultiStyle. And then, evaluate: Locale currentPlatform: (Locale localeID: (LocaleID isoString: 'sl')). and save the image. If you are on Windows, this would let you type latin2 characters, display them, and use it everywhere in the image... latin2.out.gz (27K) Download Attachment |
In reply to this post by tgkuo
TG Kuo,
> It's a simplified-Chinese based edition, not usable at our country > which is a traditional-Chinese big-5 encoding world. > > anyway, thanks. BTW, it is almost just a matter of selecting a good fonts and making an "environment" for traditional Chinese. As we do have a reserved IDs for both traditional Chinese and simplified Chinese, The simplified Chinese, it should be really easy once somebody who really wants to have it does it. And, I'm more than happy to help. -- Yoshiki |
In reply to this post by Yoshiki Ohshima
Hi Yoshiki
I'm a Chinese user, actually traditional Chinese ( using big5 encoding). I'm bothered by it, too, Squeak could not show our fonts in the UI, though there is no problem in the redered web pages by Seaside 3.0. That means the encoding is corrent (UTF-8) but the font is missing. Why squeak could not display the encoding correctly as others do? is it related it's unique way of interpreting the encoding, not by the OS API. Some questions want to ask: How is the file latin2.out made? Can we just use windows system font instead? if not, how could we make chinese.out ? |
On Thursday 11 Nov 2010 1:46:36 pm Eno wrote:
> Some questions want to ask: > > How is the file latin2.out made? This is a file out of a set (Array) of StrikeFont objects. > if not, how could we make chinese.out ? You have to create a bunch of StrikeFonts and create a function to file them out and load them back in. See class side methods in StrikeFontSet createExternalFontFile.... for creating such files and installExternalFontFile.... for loading them back. Subbu |
If I remember well there was Chinese support in the past. So searching
the mail archive might be helpful. --Hannes On 11/11/10, K. K. Subramaniam <[hidden email]> wrote: > On Thursday 11 Nov 2010 1:46:36 pm Eno wrote: >> Some questions want to ask: >> >> How is the file latin2.out made? > This is a file out of a set (Array) of StrikeFont objects. > >> if not, how could we make chinese.out ? > You have to create a bunch of StrikeFonts and create a function to file them > out and load them back in. See class side methods in StrikeFontSet > createExternalFontFile.... for creating such files > and > installExternalFontFile.... for loading them back. > > Subbu > > |
In reply to this post by Eno
At Thu, 11 Nov 2010 00:16:36 -0800 (PST),
Eno wrote: > > > Hi Yoshiki > > I'm a Chinese user, actually traditional Chinese ( using big5 encoding). > > I'm bothered by it, too, Squeak could not show our fonts in the UI, though > there is no problem in the redered web pages by Seaside 3.0. That means the > encoding is corrent (UTF-8) but the font is missing. > > Why squeak could not display the encoding correctly as others do? is it > related it's unique way of interpreting the encoding, not by the OS API. > > Some questions want to ask: > > How is the file latin2.out made? > > Can we just use windows system font instead? > > if not, how could we make chinese.out ? Sorry for taking forever to answer this. I dug up the font files I created while ago and uploaded them to: http://tinlizzie.org/~ohshima/uSimplifiedChineseFont.out http://tinlizzie.org/~ohshima/uTraditionalChineseFont.out In the Etoys development image, you can load it by evaluating: StrikeFontSet installExternalFontFileName: 'uSimplifiedChineseFont.out' encoding: SimplifiedChineseEnvironment leadingChar encodingName: #SimplifiedChinese textStyleName: #DefaultMultiStyle. The trunk image seems to be missing some methods to make it run, however. -- Yoshiki |
Hi,
I downloaded the development image from http://etoys.squeak.org/download/ , run the default image, evaluate the code, but still can not show the font as expected. I suppose that it cann't work because what I needed is TraditionalChineseEnvironment in my WinXP PC. I explored the system for the class "TraditionalChineseEnvironment:" but it is missing. There is only a few environment available in Multilingual-languages package, how can I make the "TraditionalChineseEnvironment " one, I currently had no ideas to write the class methods that need to be changed for TraditionalChineseEnvironment, can you help me to build one? thanks. Best regards, Eno On 2010/12/15 下午 03:27, Yoshiki Ohshima wrote: > At Thu, 11 Nov 2010 00:16:36 -0800 (PST), > Eno wrote: >> >> Hi Yoshiki >> >> I'm a Chinese user, actually traditional Chinese ( using big5 encoding). >> >> I'm bothered by it, too, Squeak could not show our fonts in the UI, though >> there is no problem in the redered web pages by Seaside 3.0. That means the >> encoding is corrent (UTF-8) but the font is missing. >> >> Why squeak could not display the encoding correctly as others do? is it >> related it's unique way of interpreting the encoding, not by the OS API. >> >> Some questions want to ask: >> >> How is the file latin2.out made? >> >> Can we just use windows system font instead? >> >> if not, how could we make chinese.out ? > Sorry for taking forever to answer this. I dug up the font files I > created while ago and uploaded them to: > > http://tinlizzie.org/~ohshima/uSimplifiedChineseFont.out > http://tinlizzie.org/~ohshima/uTraditionalChineseFont.out > > In the Etoys development image, you can load it by evaluating: > > StrikeFontSet installExternalFontFileName: 'uSimplifiedChineseFont.out' encoding: SimplifiedChineseEnvironment leadingChar encodingName: #SimplifiedChinese textStyleName: #DefaultMultiStyle. > > The trunk image seems to be missing some methods to make it run, > however. > > -- Yoshiki > > |
In reply to this post by Yoshiki Ohshima-2
Hi Yoshiki,
I downloaded the development image from http://etoys.squeak.org/download/ , run the default image, evaluate the code, but still can not show the font as expected. I suppose that it cann't work because what I needed is TraditionalChineseEnvironment in my WinXP PC. I explored the system for the class "TraditionalChineseEnvironment:" but it is missing. There is only a few environment available in Multilingual-languages package, how can I make the "TraditionalChineseEnvironment " one, I currently had no ideas to write the class methods that need to be changed for TraditionalChineseEnvironment, can you help me to build one? thanks. Best regards, Eno On 2010/12/15 下午 03:27, Yoshiki Ohshima wrote: > At Thu, 11 Nov 2010 00:16:36 -0800 (PST), > Eno wrote: >> >> Hi Yoshiki >> >> I'm a Chinese user, actually traditional Chinese ( using big5 encoding). >> >> I'm bothered by it, too, Squeak could not show our fonts in the UI, though >> there is no problem in the redered web pages by Seaside 3.0. That means the >> encoding is corrent (UTF-8) but the font is missing. >> >> Why squeak could not display the encoding correctly as others do? is it >> related it's unique way of interpreting the encoding, not by the OS API. >> >> Some questions want to ask: >> >> How is the file latin2.out made? >> >> Can we just use windows system font instead? >> >> if not, how could we make chinese.out ? > Sorry for taking forever to answer this. I dug up the font files I > created while ago and uploaded them to: > > http://tinlizzie.org/~ohshima/uSimplifiedChineseFont.out > http://tinlizzie.org/~ohshima/uTraditionalChineseFont.out > > In the Etoys development image, you can load it by evaluating: > > StrikeFontSet installExternalFontFileName: 'uSimplifiedChineseFont.out' encoding: SimplifiedChineseEnvironment leadingChar encodingName: #SimplifiedChinese textStyleName: #DefaultMultiStyle. > > The trunk image seems to be missing some methods to make it run, > however. > > -- Yoshiki > > |
In reply to this post by Eno
At Wed, 15 Dec 2010 18:54:26 +0800,
tgkuo wrote: > > Hi, > > I downloaded the development image from > http://etoys.squeak.org/download/ , run the default image, evaluate the > code, but still can not show the font as expected. > > I suppose that it cann't work because what I needed is > TraditionalChineseEnvironment in my WinXP PC. > > I explored the system for the class "TraditionalChineseEnvironment:" but > it is missing. > There is only a few environment available in Multilingual-languages > package, how can I make the "TraditionalChineseEnvironment " one, I > currently had no ideas to write the class methods that need to be > changed for TraditionalChineseEnvironment, can you help me to build one? Again, sorry for slow response. If you don't need support for CNS 11643 (or BIG 5), SimplifiedChineseEnvironment can mimick, say, the Russian Environment. You would need a new "leading char" assigned (just get the next available one), and define a font you would like to use. Nowadays, you would like to consider to use TrueType fonts. The OLPC image includes a method #makeSmartRefFilesFrom:encodingTag:ranges:outputFileName: at TTCFontSet. You define the range you're interested in, choose a TT font that convers it, and create a ".out" file. -- Yoshiki |
Free forum by Nabble | Edit this page |