Hi !
I am in search of up to date links (or tips) on how to work with UTF-8 (instead of the default latin1) inside Squeak. I thank you in advance for your help. Pierre-Edouard |
2009/3/27 Pierre-Edouard PORTIER <[hidden email]>:
> Hi ! > I am in search of up to date links (or tips) on how to work with UTF-8 > (instead of the default latin1) inside Squeak. > I thank you in advance for your help. > Pierre-Edouard aString convertToEncoding: 'utf-8' aString convertFromEncoding: 'utf-8' Cheers Philippe |
In reply to this post by peportier
On Fri, Mar 27, 2009 at 8:56 PM, Pierre-Edouard PORTIER
<[hidden email]> wrote: > I am in search of up to date links (or tips) on how to work with UTF-8 > (instead of the default latin1) inside Squeak. > I thank you in advance for your help. http://article.gmane.org/gmane.comp.lang.smalltalk.pharo.devel/5065/match=looking+unicode+testers -- Damien Cassou http://damiencassou.seasidehosting.st |
In reply to this post by Philippe Marschall
Thank you Philippe,
I was aware of : aString squeakToUtf8 aString utf8ToSqueak But I would like to be able to *see* utf-8 characters inside the squeak environment. Cheers Pierre-Edouard On Sat, Mar 28, 2009 at 5:20 AM, Philippe Marschall <[hidden email]> wrote: 2009/3/27 Pierre-Edouard PORTIER <[hidden email]>: |
In reply to this post by Damien Cassou-3
Thank you Damien,
I will be a tester of this fork. Pierre-Edouard On Sat, Mar 28, 2009 at 11:03 AM, Damien Cassou <[hidden email]> wrote:
|
In reply to this post by Pierre-Edouard PORTIER
2009/3/28 Pierre-Edouard PORTIER <[hidden email]>:
> Thank you Philippe, > > I was aware of : > aString squeakToUtf8 > aString utf8ToSqueak > > But I would like to be able to *see* utf-8 characters inside the squeak > environment. What do you mean with that? What do you understand as an utf-8 character? Cheers Philippe |
In reply to this post by Pierre-Edouard PORTIER
On Sat, Mar 28, 2009 at 11:55 AM, Pierre-Edouard PORTIER
<[hidden email]> wrote: > But I would like to be able to *see* utf-8 characters inside the squeak > environment. Are you sure you are not confusing "utf-8" with "unicode"? utf-8 is just one way of encoding unicode (characters). You can import utf-8 encoded characters/strings, but once inside Squeak they are kept as unicode characters. Michael |
2009/3/29 Michael Rueger <[hidden email]>:
> On Sat, Mar 28, 2009 at 11:55 AM, Pierre-Edouard PORTIER > <[hidden email]> wrote: > > >> But I would like to be able to *see* utf-8 characters inside the squeak >> environment. > > Are you sure you are not confusing "utf-8" with "unicode"? utf-8 is > just one way of encoding unicode (characters). > You can import utf-8 encoded characters/strings, but once inside > Squeak they are kept as unicode characters. Plus leadingChar, which causes a lot of problems for web applications. Cheers Philippe |
Philippe Marschall pravi:
> Michael Rueger: >> Pierre-Edouard PORTIER wrote: >>> But I would like to be able to *see* utf-8 characters inside the squeak >>> environment. >> Are you sure you are not confusing "utf-8" with "unicode"? utf-8 is >> just one way of encoding unicode (characters). >> You can import utf-8 encoded characters/strings, but once inside >> Squeak they are kept as unicode characters. > Plus leadingChar, which causes a lot of problems for web applications. We don't have any problems with Squeak Unicode in Aida/Web apps, probably because we strictly use Unicode internally, not the UTF-8 encoded strings. All such strings are then encoded/decoded to the UTF-8 "at the edge" of image by Aida web framework. Best regards Janko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
2009/3/29 Janko Mivšek <[hidden email]>:
> Philippe Marschall pravi: >> Michael Rueger: >>> Pierre-Edouard PORTIER wrote: > >>>> But I would like to be able to *see* utf-8 characters inside the squeak >>>> environment. > >>> Are you sure you are not confusing "utf-8" with "unicode"? utf-8 is >>> just one way of encoding unicode (characters). >>> You can import utf-8 encoded characters/strings, but once inside >>> Squeak they are kept as unicode characters. > >> Plus leadingChar, which causes a lot of problems for web applications. > > We don't have any problems with Squeak Unicode in Aida/Web apps, > probably because we strictly use Unicode internally, not the UTF-8 > encoded strings. All such strings are then encoded/decoded to the UTF-8 > "at the edge" of image by Aida web framework. What leadingChar do you use? The one of the image? Cheers Philippe |
In reply to this post by Janko Mivšek
2009/3/29 Janko Mivšek <[hidden email]>:
> Philippe Marschall pravi: >> Michael Rueger: >>> Pierre-Edouard PORTIER wrote: > >>>> But I would like to be able to *see* utf-8 characters inside the squeak >>>> environment. > >>> Are you sure you are not confusing "utf-8" with "unicode"? utf-8 is >>> just one way of encoding unicode (characters). >>> You can import utf-8 encoded characters/strings, but once inside >>> Squeak they are kept as unicode characters. > >> Plus leadingChar, which causes a lot of problems for web applications. > > We don't have any problems with Squeak Unicode in Aida/Web apps, > probably because we strictly use Unicode internally, You can not do that. Squeak stores the language of a character in every character. In a web application you don't know the language of the input and utf-8 certainly doesn't contain it. You could take the language of the image but that is random and has no relation to the input. You could also set the language of a character to unicode (255) but that only works for non-Latin-1 characters, these are interned and all have leadingChar 0. Did I already mention that the leadingChar is used for #=? So no, I don't believe you. Cheers Philippe |
> You can not do that. Squeak stores the language of a character in
> every character. In a web application you don't know the language of > the input and utf-8 certainly doesn't contain it. You could take the > language of the image but that is random and has no relation to the > input. You could also set the language of a character to unicode (255) > but that only works for non-Latin-1 characters, these are interned and > all have leadingChar 0. Did I already mention that the leadingChar is > used for #=? So no, I don't believe you. > > Cheers > Philippe > It seems most reasonnable to me to switch unicode leadingChar to 0. Why couldn't we just do that? Of course, all this does not really answer Pierre Edouard questions... Pierre, what do you want unicode for? - displaying any arbitrary character inside squeak - inputing any character with keyboard in squeak - exchanging files made of arbitrary characters with external world (utf-8, utf-16 or other formats) - reading and writing filenames containing arbitrary characters - anything else? Nicolas |
Hi Nicolas !
Thank you for this nice synthesis. I want to: - display any arbitrary character inside Squeak (for example Greek characters) - input any character with keyboard inside Squeak - exchange utf-8 encoded data with external world Pierre-Edouard On Sun, Mar 29, 2009 at 2:42 PM, Nicolas Cellier <[hidden email]> wrote:
|
In reply to this post by Philippe Marschall
Philippe Marschall pravi:
> Janko Mivšek: >> We don't have any problems with Squeak Unicode in Aida/Web apps, >> probably because we strictly use Unicode internally, > You can not do that. Squeak stores the language of a character in > every character. In a web application you don't know the language of > the input and utf-8 certainly doesn't contain it. You could take the > language of the image but that is random and has no relation to the > input. You could also set the language of a character to unicode (255) > but that only works for non-Latin-1 characters, these are interned and > all have leadingChar 0. Did I already mention that the leadingChar is > used for #=? So no, I don't believe you. Well, you should believe me, I have a proof! Look at this Aida/Scribo multilingual demo served from Squeak image: http://demo.bioskop.fr/wiki/wiki.html, see specially Japanese and Russian text. Even Japanese urls are working correctly: http://demo.bioskop.fr/wiki/%E3%83%86%E3%82%B9%E3%83%88.html About leading character, I even don't know what is that, except in theory. That is, I never encounter this character as a problem when porting Aida and its i8n support to Squeak. Best regards Janko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
2009/3/29 Janko Mivšek <[hidden email]>:
> Philippe Marschall pravi: >> Janko Mivšek: > >>> We don't have any problems with Squeak Unicode in Aida/Web apps, >>> probably because we strictly use Unicode internally, > >> You can not do that. Squeak stores the language of a character in >> every character. In a web application you don't know the language of >> the input and utf-8 certainly doesn't contain it. You could take the >> language of the image but that is random and has no relation to the >> input. You could also set the language of a character to unicode (255) >> but that only works for non-Latin-1 characters, these are interned and >> all have leadingChar 0. Did I already mention that the leadingChar is >> used for #=? So no, I don't believe you. > > Well, you should believe me, I have a proof! > > Look at this Aida/Scribo multilingual demo served from Squeak image: > http://demo.bioskop.fr/wiki/wiki.html, see specially Japanese and > Russian text. Even Japanese urls are working correctly: > http://demo.bioskop.fr/wiki/%E3%83%86%E3%82%B9%E3%83%88.html That's just external representation, that tells absolutely nothing about internal representation and the implementation. I could easily the the same result on a Squeak 3.7. > About leading character, I even don't know what is that, except in > theory. That is, I never encounter this character as a problem when > porting Aida and its i8n support to Squeak. How can you seriously say everything is working fine when in practice you can't say what is happening and don't know how Strings and Characters work in Squeak? I find that quite dubious hyping. Cheers Philippe |
Philippe Marschall pravi:
>> Look at this Aida/Scribo multilingual demo served from Squeak image: >> http://demo.bioskop.fr/wiki/wiki.html, see specially Japanese and >> Russian text. Even Japanese urls are working correctly: >> http://demo.bioskop.fr/wiki/%E3%83%86%E3%82%B9%E3%83%88.html > > That's just external representation, that tells absolutely nothing > about internal representation and the implementation. I could easily > the the same result on a Squeak 3.7. For this you need WideStrings and proper UTF-8 converter. Does Squeak 3.7 has that? >> About leading character, I even don't know what is that, except in >> theory. That is, I never encounter this character as a problem when >> porting Aida and its i8n support to Squeak. > > How can you seriously say everything is working fine when in practice > you can't say what is happening and don't know how Strings and > Characters work in Squeak? I find that quite dubious hyping. Not hype at all but pure reality. And coming from country where we already need Unicode characters above 256, you can be sure that I know what I'm talking about. If there would be some problem, I would be the first encountering it. But there are no problems with Unicode strings prepared by Aida, so why should I bother? This is like a premature optimization for me. Note also that Masashi Umezawa, a Japanese guy, made a preview and few modifications to Aida to work well with Japanese writing, in all aspects from Urls to the content. Because of his work I'm therefore even more sure that we did the Unicode support right! Janko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
Janko Mivšek <[hidden email]> writes:
Hello Janko I guess, Phillip talks about in-image japanese/arabic/whatever. This needs probably changes to the vm. Here on Mac OS X it doesnt work. I just get empty block-glyphs. Its not possible to copy non latin characters into the workspace. Linux-vms might handle this better. ciao Enno > Note also that Masashi Umezawa, a Japanese guy, made a preview and few > modifications to Aida to work well with Japanese writing, in all aspects > from Urls to the content. Because of his work I'm therefore even more > sure that we did the Unicode support right! |
In reply to this post by Janko Mivšek
2009/3/29 Janko Mivšek <[hidden email]>:
> Philippe Marschall pravi: > >>> Look at this Aida/Scribo multilingual demo served from Squeak image: >>> http://demo.bioskop.fr/wiki/wiki.html, see specially Japanese and >>> Russian text. Even Japanese urls are working correctly: >>> http://demo.bioskop.fr/wiki/%E3%83%86%E3%82%B9%E3%83%88.html >> >> That's just external representation, that tells absolutely nothing >> about internal representation and the implementation. I could easily >> the the same result on a Squeak 3.7. > > For this you need WideStrings and proper UTF-8 converter. No you don't. You just need to emit the right bytes. The simplest way to achive this is return 1:1 what was inserted. This works well as long as you don't need any String semantics. This is for example what DabbleDB does. > Does Squeak 3.7 has that? > >>> About leading character, I even don't know what is that, except in >>> theory. That is, I never encounter this character as a problem when >>> porting Aida and its i8n support to Squeak. >> >> How can you seriously say everything is working fine when in practice >> you can't say what is happening and don't know how Strings and >> Characters work in Squeak? I find that quite dubious hyping. > > Not hype at all but pure reality. And coming from country where we > already need Unicode characters above 256, you can be sure that I know > what I'm talking about. Then tell us what leadingChar you use. And tell us how you address the issue that #= takes the leadingChar into account. > If there would be some problem, I would be the > first encountering it. No, as I said as long as you're just outputting the input you won't. > But there are no problems with Unicode strings > prepared by Aida, so why should I bother? This is like a premature > optimization for me. What, getting semantics of #= right is premature optimization? Having a working String protocol is premature optimization? > Note also that Masashi Umezawa, a Japanese guy, made a preview and few > modifications to Aida to work well with Japanese writing, in all aspects > from Urls to the content. Because of his work I'm therefore even more > sure that we did the Unicode support right! Then tell us how it works and how it addresses the leadingChar issues outlined in this thread. Cheers Philippe |
In reply to this post by Enrico Schwass-2
On 29.03.2009, at 16:32, Enrico Schwass wrote: > Janko Mivšek <[hidden email]> writes: > > Hello Janko > > I guess, Phillip talks about in-image japanese/arabic/whatever. This > needs probably changes to the vm. Here on Mac OS X it doesnt work. The VMs can provide full unicode input now, but not all images have been adapted to make use of it. And that is completely separate from unicode font rendering support in the image. > I just get empty block-glyphs. Your image needs to use the UTF-32 unicode character that recent VMs produce along with the old byte-sized character. Check that "ActiveHand keyboardInterpreter" is in fact a UTF32InputInterpreter. > Its not possible to copy non latin characters into the workspace. Your image needs to make use of the ClipboardExtendedPlugin which does ship in current Mac VMs. - Bert - |
2009/3/29 Bert Freudenberg <[hidden email]>:
> > On 29.03.2009, at 16:32, Enrico Schwass wrote: > >> Janko Mivšek <[hidden email]> writes: >> >> Hello Janko >> >> I guess, Phillip talks about in-image japanese/arabic/whatever. This >> needs probably changes to the vm. Here on Mac OS X it doesnt work. > > The VMs can provide full unicode input now, but not all images have been > adapted to make use of it. And that is completely separate from unicode font > rendering support in the image. > I presume that a good Font or FontSet with unicode support should be in image for rendering correctly. Any link to a good Howto? >> I just get empty block-glyphs. > > Your image needs to use the UTF-32 unicode character that recent VMs produce > along with the old byte-sized character. > > Check that "ActiveHand keyboardInterpreter" is in fact a > UTF32InputInterpreter. > For images which does not have UTF32InputInterpreter, let me remind Bert's and Yoshiki's job is pending at http://bugs.squeak.org/view.php?id=7071 ... >> Its not possible to copy non latin characters into the workspace. > > > Your image needs to make use of the ClipboardExtendedPlugin which does ship > in current Mac VMs. > > - Bert - > > > |
Free forum by Nabble | Edit this page |