This is a newbie question.
I really don't understand the issue with Unicode and the image/vm/os in Pharo/SqueakVM/Linux,Windows. I searched the lists for references to utf-8 and the long discussions about leadingChar encodings convert*Encoding methods Strings WideStrings Text DisplayText WAKom and WAKomEncoded keyboard input text file input/output vm's -encoding -textenc options So, the people that know about this things, can you please give a newbie explanation about this. What is needed for we users to: 1. Input text by keyboard with accents (and maybe in the language of Mordor too) and they look not like a diamond with a ? sign in the image system browser. 2. This text be correctly send to a modern web browser and rendered in utf-8 encoding correctly. 3. Upload a java messages bundle (or a file with characters outside ASCII) with a web browser posted to a Seaside application, stored temporaly in disk on the server and processed correctly (by opening it with some FileStream class) inside the image. Or is this really hard to do and I'm asking for the imposible? I understand that there are performance issues with full unicode image/vm but supposing that premature optimization rule applies here, what we need to do to achieve this (utopian maybe) goal? Why the question, because I put some strings inside the image (they look fine as I type them). Then I used them to put labels in my Seaside app. In the image I typed and saw in a code browser: Búsqueda de información But the web browser (firefox) I see: B�squeda de informaci�n If I change the web browser encoding to iso8859-1 I see it correctly. Now if I evaluate 'Búsqueda de información' convertToEncoding: 'utf-8' this gives: 'Búsqueda de información' and if I use this weird, uneditable by hand, string as the string for my Seaside app, I correctly see the string in the web browser: Búsqueda de información. So, I really don't understand. Should I always write my strings in the image as I want them, use convertToEncoding: method and use the output as if that were my string? Well, thanks for your answers. P.S. I am using pharo core 1.0, Seaside 2.8 and squeakvm version: 4.0.3-2202 #1 XShm Sat Apr 17 18:21:07 UTC 2010 gcc 4.4.3 in a 64 bit Debian Linux with full utf-8 locale: miguel@laptop:~/proyectos/azteca$ locale LANG=es_MX.UTF-8 LC_CTYPE="es_MX.UTF-8" LC_NUMERIC="es_MX.UTF-8" LC_TIME="es_MX.UTF-8" LC_COLLATE="es_MX.UTF-8" LC_MONETARY="es_MX.UTF-8" LC_MESSAGES="es_MX.UTF-8" LC_PAPER="es_MX.UTF-8" LC_NAME="es_MX.UTF-8" LC_ADDRESS="es_MX.UTF-8" LC_TELEPHONE="es_MX.UTF-8" LC_MEASUREMENT="es_MX.UTF-8" LC_IDENTIFICATION="es_MX.UTF-8" LC_ALL= Cheers -- Miguel Cobá http://miguel.leugim.com.mx _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
On Jul 19, 2010, at 7:12 AM, Miguel Enrique Cobá Martínez wrote: > This is a newbie question. > > I really don't understand the issue with Unicode and the image/vm/os in > Pharo/SqueakVM/Linux,Windows. > > I searched the lists for references to utf-8 and the long discussions > about > leadingChar > encodings > convert*Encoding methods > Strings > WideStrings > Text > DisplayText > WAKom and WAKomEncoded > keyboard input > text file input/output > vm's -encoding -textenc options > > So, the people that know about this things, can you please give a newbie > explanation about this. I would love to have the possibility to describe the situation so that we can refer to it later. > > What is needed for we users to: > > 1. Input text by keyboard with accents (and maybe in the language of > Mordor too) and they look not like a diamond with a ? sign in the image > system browser. > 2. This text be correctly send to a modern web browser and rendered in > utf-8 encoding correctly. > 3. Upload a java messages bundle (or a file with characters outside > ASCII) with a web browser posted to a Seaside application, stored > temporaly in disk on the server and processed correctly (by opening it > with some FileStream class) inside the image. > > Or is this really hard to do and I'm asking for the imposible? > > I understand that there are performance issues with full unicode > image/vm but supposing that premature optimization rule applies here, > what we need to do to achieve this (utopian maybe) goal? > > > Why the question, because I put some strings inside the image (they look > fine as I type them). Then I used them to put labels in my Seaside app. > In the image I typed and saw in a code browser: > > Búsqueda de información > > But the web browser (firefox) I see: > > B�squeda de informaci�n > > If I change the web browser encoding to iso8859-1 I see it correctly. > > Now if I evaluate > > 'Búsqueda de información' convertToEncoding: 'utf-8' > > this gives: > > 'Búsqueda de información' > > and if I use this weird, uneditable by hand, string as the string for my > Seaside app, I correctly see the string in the web browser: > > Búsqueda de información. > > So, I really don't understand. Should I always write my strings in the > image as I want them, use convertToEncoding: method and use the output > as if that were my string? > > Well, thanks for your answers. > > P.S. I am using pharo core 1.0, Seaside 2.8 and squeakvm version: > > 4.0.3-2202 #1 XShm Sat Apr 17 18:21:07 UTC 2010 gcc 4.4.3 > > in a 64 bit Debian Linux with full utf-8 locale: > > miguel@laptop:~/proyectos/azteca$ locale > LANG=es_MX.UTF-8 > LC_CTYPE="es_MX.UTF-8" > LC_NUMERIC="es_MX.UTF-8" > LC_TIME="es_MX.UTF-8" > LC_COLLATE="es_MX.UTF-8" > LC_MONETARY="es_MX.UTF-8" > LC_MESSAGES="es_MX.UTF-8" > LC_PAPER="es_MX.UTF-8" > LC_NAME="es_MX.UTF-8" > LC_ADDRESS="es_MX.UTF-8" > LC_TELEPHONE="es_MX.UTF-8" > LC_MEASUREMENT="es_MX.UTF-8" > LC_IDENTIFICATION="es_MX.UTF-8" > LC_ALL= > > Cheers > > > -- > Miguel Cobá > http://miguel.leugim.com.mx > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by Miguel Cobá
On 19.07.2010 07:12, Miguel Enrique Cobá Martínez wrote:
> ... > 2. This text be correctly send to a modern web browser and rendered in > utf-8 encoding correctly. In Seaside 2.8 - use WAKomEncoded - configure the application to use utf-8 - do _not_ send #convertToEncoding: In Seaside 3.0 rc - set the encoding on the server to utf-8 - do _not_ send #convertToEncoding: If that doesn't work please post on the Seaside mailing list with: - the exact Pharo image version you have - the exact Seaside version you have - the exact Kom version you have - does the String display correctly in inspector? - the output of (yourString convertToEncoding: 'utf-8') asByteArray > 3. Upload a java messages bundle (or a file with characters outside > ASCII) with a web browser posted to a Seaside application, stored > temporaly in disk on the server and processed correctly (by opening it > with some FileStream class) inside the image. Java message bundles: saving: - open a file stream in binary mode - save contents of the upload to the stream reading: - open a file stream in text mode and iso-8859-1 as the encoding. This will only work for Latin-1 character's but Java message bundles support Unicode. You need to manually process Unicode escapes [1]. other files with characters outside ASCII: saving: - open a file stream in binary mode - save contents of the upload to the stream reading: - Open a file stream in text mode and the correct encoding on the file. Knowing the right encoding is hard to impossible. Seaside doesn't know it because the browser doesn't tell Seaside. The browser doesn't know it because the operating system doesn't know either. [1] http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3 Cheers Philippe _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |