There has been a lot of excellent work done on Unicode in Squeak.
Unfortunately, not all of it works 'out of the box'. I've started working on trying to figure out how to enable the basic operations (input, display, and copy/paste) for several languages. With the kind help of several Russian squeakers, I've managed to create relatively simple instructions on how to do so for Russian, which I've posted here: http://wiki.squeak.org/squeak/5773 ; I've personally tested the 3.10 instructions under Linux. However, getting to this point involved days of experimentation, including quite a lot of image hangs. The reason I created the documentation was that, despite reading everything on squeak.org and the mailing list archives about Russian and Cyrillic support, I still couldn't get it working - it took combining the instructions of two people who had had similar experiences, and the patches of two others, to have everything work, although there are some problems with garbage characters with copy/paste sometimes. A side effect of these instructions is that various accented European characters (such as à, å, ò, ö, ĵ, ñ, ç, č, ć, š, đ, etc - everything I've tried) work. This cursory testing suggests that Italian/French/German/Spanish/Swedish/Esperanto/Slovenian/etc operations also work after following the Russian instructions; they do not work by default. Perhaps more interestingly, Thai and Greek also work; Arabic nearly does, but only displays in its 'general Unicode' form (rather than correctly changing what appears based on the position of each letter in the word: see http://en.wikipedia.org/wiki/Arabic_alphabet for a better description). I'm now going through the same process for Japanese. Installing 'Japanese Environment Installer' (JEI) from SqueakMap only gave me a working display of Japanese characters, without working input or copy/paste. Possibly interestingly, direct input while using a Japanese keymap (with no input method) to type kana works with the Russian patches, but not with the JEI patches. Nothing using extra input method software works anywhere in Squeak in any configuration I've tried. Deadkeys also don't work. I would like to request help from all Squeakers who can input, display, and copy/paste text in languages which have characters or writing systems that don't appear in the languages mentioned above, and especially of those who have everything working with Japanese. Any instructions on how you did it would be greatly appreciated. I intend to keep documenting the simplest ways to currently set things up, which will hopefully have the side effect of making a baseline set of patches to have things Just Work become clear. Any reports of errata in the documentation, platform differences, or bugs are also extremely welcome. Thank you; Katerina Barone-Adesi |
Am 03.09.2008 um 22:44 schrieb Katerina Barone-Adesi: > There has been a lot of excellent work done on Unicode in Squeak. > Unfortunately, not all of it works 'out of the box'. > > I've started working on trying to figure out how to enable the basic > operations (input, display, and copy/paste) for several languages. > With the kind help of several Russian squeakers, I've managed to > create relatively simple instructions on how to do so for Russian, > which I've posted here: http://wiki.squeak.org/squeak/5773 ; I've > personally tested the 3.10 instructions under Linux. However, getting > to this point involved days of experimentation, including quite a lot > of image hangs. The reason I created the documentation was that, > despite reading everything on squeak.org and the mailing list archives > about Russian and Cyrillic support, I still couldn't get it working - > it took combining the instructions of two people who had had similar > experiences, and the patches of two others, to have everything work, > although there are some problems with garbage characters with > copy/paste sometimes. > > A side effect of these instructions is that various accented European > characters (such as à, å, ò, ö, ĵ, ñ, ç, č, ć, š, đ, etc - > everything > I've tried) work. This cursory testing suggests that > Italian/French/German/Spanish/Swedish/Esperanto/Slovenian/etc > operations also work after following the Russian instructions; they do > not work by default. Perhaps more interestingly, Thai and Greek also > work; Arabic nearly does, but only displays in its 'general Unicode' > form (rather than correctly changing what appears based on the > position of each letter in the word: see > http://en.wikipedia.org/wiki/Arabic_alphabet for a better > description). > > I'm now going through the same process for Japanese. Installing > 'Japanese Environment Installer' (JEI) from SqueakMap only gave me a > working display of Japanese characters, without working input or > copy/paste. Possibly interestingly, direct input while using a > Japanese keymap (with no input method) to type kana works with the > Russian patches, but not with the JEI patches. > > Nothing using extra input method software works anywhere in Squeak in > any configuration I've tried. Deadkeys also don't work. > > I would like to request help from all Squeakers who can input, > display, and copy/paste text in languages which have characters or > writing systems that don't appear in the languages mentioned above, > and especially of those who have everything working with Japanese. > Any instructions on how you did it would be greatly appreciated. I > intend to keep documenting the simplest ways to currently set things > up, which will hopefully have the side effect of making a baseline set > of patches to have things Just Work become clear. Any reports of > errata in the documentation, platform differences, or bugs are also > extremely welcome. > > Thank you; > Katerina Barone-Adesi We fixed many of these shortcomings in the OLPC Linux VM and image. Keyboard input works with dead keys, compose keys, and XIM. Clipboard works with unicode, formatted/rich text and images (this builds on the Sophie guys' work). There is rendering of various text scripts via Pango, including e.g. Devanagari with correct glyph shaping (Nepal is one of the pilot countries). - Bert - |
> We fixed many of these shortcomings in the OLPC Linux VM and image. Keyboard
> input works with dead keys, compose keys, and XIM. Clipboard works with > unicode, formatted/rich text and images (this builds on the Sophie guys' > work). There is rendering of various text scripts via Pango, including e.g. > Devanagari with correct glyph shaping (Nepal is one of the pilot countries). I've tried the OLPC Squeak image, using the instructions at http://wiki.laptop.org/go/Etoys#Method_1_.28Easiest.29_-_Use_Squeakland_installation_and_the_OLPC_image and the link to what it says is the newest image: http://etoys.laptop.org/src/etoys-image-and-pr.zip . The results were mixed. Successes: - Accented Latin characters work immediately. - Japanese support partially works: switching to Japanese, fonts are successfully loaded, and Japanese is displayed. Problems: - Deadkeys don't work. - Anthy + SKIM didn't work. - typing kana by using a Japanese keymap shows up as '?', even when etoys is showing up properly in Japanese. - Missing fonts and broken 'choose language' support: --typing Arabic only shows '?', and selecting Arabic as a language doesn't change that; nothing seems to happen. --selecting 'Persian' translates the user interface into a series of '?', with a tiny bit of English for the strings that aren't translated; the necessary font is missing, and there is no prompt to add it. This also applies to Singhalese and Urdu. Turkish has a much more minor version of this problem - most letters are there, but a few are missing. --typing Greek only shows '?'. Selecting Greek as a language says "This language needs additional fonts. Do you want to install the fonts? Yes / No". Selecting 'yes' brings up a debugger, because the .sar file for Greek fonts is missing. This also applies to Russian, Chinese, and Romanian. -- Selecting Korean as a language brings up a debugger: "Error: My subclass should have overridden #leadingChar". I didn't evaluate Marathi, Mongolian, Nepali, Pushto, or Telegu, since I don't have my system set up to input in them and none of the strings that I happened to have on my screen were translated when I switched to them. I also didn't evaluate copy/paste for anything. So: the situation is slightly better than what http://www.nabble.com/3.10---Mac-OS-X-Leopard---accent-chars-and-keyboard-input-td15332280.html led me to believe. Accented Latin characters, when typed directly (ie, without deadkeys/input methods) work out of the box in the OLPC image, which I hope makes it into the basic Squeak image soon. However, there are definitely still problems left as well. If any of this is user error on my part, I'd be very glad. Assuming at least some of it is not, while the OLPC image is a step in the right direction - and probably would make a better baseline than the 'basic' Squeak image - I'm still left with several of the problems my previous email mentioned, and still very glad to hear about any successes people have had with them. Some of this stuff has clearly worked for some people at some point, which is part of what makes the current situation and its regressions so unexpected. Regards; Katerina Barone-Adesi |
Am 04.09.2008 um 00:36 schrieb Katerina Barone-Adesi: >> We fixed many of these shortcomings in the OLPC Linux VM and image. >> Keyboard >> input works with dead keys, compose keys, and XIM. Clipboard works >> with >> unicode, formatted/rich text and images (this builds on the Sophie >> guys' >> work). There is rendering of various text scripts via Pango, >> including e.g. >> Devanagari with correct glyph shaping (Nepal is one of the pilot >> countries). > > > > I've tried the OLPC Squeak image, using the instructions at > http://wiki.laptop.org/go/Etoys#Method_1_.28Easiest.29_-_Use_Squeakland_installation_and_the_OLPC_image > and the link to what it says is the newest image: > http://etoys.laptop.org/src/etoys-image-and-pr.zip . The results were > mixed. > > Successes: > - Accented Latin characters work immediately. > - Japanese support partially works: switching to Japanese, fonts are > successfully loaded, and Japanese is displayed. > > Problems: > - Deadkeys don't work. > - Anthy + SKIM didn't work. > - typing kana by using a Japanese keymap shows up as '?', even when > etoys is showing up properly in Japanese. > > - Missing fonts and broken 'choose language' support: > --typing Arabic only shows '?', and selecting Arabic as a language > doesn't change that; nothing seems to happen. > --selecting 'Persian' translates the user interface into a series of > '?', with a tiny bit of English for the strings that aren't > translated; the necessary font is missing, and there is no prompt to > add it. This also applies to Singhalese and Urdu. Turkish has a much > more minor version of this problem - most letters are there, but a few > are missing. > --typing Greek only shows '?'. Selecting Greek as a language says > "This language needs additional fonts. Do you want to install the > fonts? Yes / No". Selecting 'yes' brings up a debugger, because the > .sar file for Greek fonts is missing. This also applies to Russian, > Chinese, and Romanian. > -- Selecting Korean as a language brings up a debugger: "Error: My > subclass should have overridden #leadingChar". > > I didn't evaluate Marathi, Mongolian, Nepali, Pushto, or Telegu, since > I don't have my system set up to input in them and none of the strings > that I happened to have on my screen were translated when I switched > to them. I also didn't evaluate copy/paste for anything. > > So: the situation is slightly better than what > http://www.nabble.com/3.10---Mac-OS-X-Leopard---accent-chars-and-keyboard-input-td15332280.html > led me to believe. Accented Latin characters, when typed directly > (ie, without deadkeys/input methods) work out of the box in the OLPC > image, which I hope makes it into the basic Squeak image soon. > However, there are definitely still problems left as well. > > If any of this is user error on my part, I'd be very glad. Assuming at > least some of it is not, while the OLPC image is a step in the right > direction - and probably would make a better baseline than the 'basic' > Squeak image - I'm still left with several of the problems my previous > email mentioned, and still very glad to hear about any successes > people have had with them. Some of this stuff has clearly worked for > some people at some point, which is part of what makes the current > situation and its regressions so unexpected. > > Regards; > Katerina Barone-Adesi You also absolutely need the OLPC VM, which has the Pango plugin, ExtendedClipboard plugin, XIM support and fixes to the keyboard handling. This will be integrated in the official VM but that has not been released yet. See http://etoys.laptop.org/ You will find RPMs there, or you have to build the VM yourself from SVN sources (pay attention to the configure output so you are not missing to build essential plugins). I failed to mention before that we also made much more of the system translatable, and switched to gettext. You can see the translation progress here (and the translation does not cover Etoys only but the whole system): https://dev.laptop.org/translate/projects/etoys/ - Bert - |
In reply to this post by Katerina Barone-Adesi
At Thu, 4 Sep 2008 00:36:35 +0200,
Katerina Barone-Adesi wrote: > > - Deadkeys don't work. > - Anthy + SKIM didn't work. > - typing kana by using a Japanese keymap shows up as '?', even when > etoys is showing up properly in Japanese. We have an extra option to the VM (we couldn't really clean the code paths to eliminate it): if you specify -compositioninput to the display-X11 module, you should be able to input Japanese text from SCIM. > - Missing fonts and broken 'choose language' support: > --typing Arabic only shows '?', and selecting Arabic as a language > doesn't change that; nothing seems to happen. > --selecting 'Persian' translates the user interface into a series of > '?', with a tiny bit of English for the strings that aren't > translated; the necessary font is missing, and there is no prompt to > add it. This also applies to Singhalese and Urdu. Turkish has a much > more minor version of this problem - most letters are there, but a few > are missing. > --typing Greek only shows '?'. Selecting Greek as a language says > "This language needs additional fonts. Do you want to install the > fonts? Yes / No". Selecting 'yes' brings up a debugger, because the > .sar file for Greek fonts is missing. This also applies to Russian, > Chinese, and Romanian. In the very latest image (you need to fetch updates), and the Pango library is installed, you should see better than ? marks. > -- Selecting Korean as a language brings up a debugger: "Error: My > subclass should have overridden #leadingChar". Ah, that should be fixed. Thank you! -- Yoshiki |
I've tried using the OLPC VM [1] and
http://etoys.laptop.org/src/etoys-dev.zip (with all 239 updates). The results are improved, but still mixed: What works: - dead keys (as long as I run squeak -vm-display-X11 -compositioninput etoys-dev.image). What half-works: - Japanese input. I can type with Anthy+Skim and see hiragana correctly, but as soon as I press space or enter, everything but the first character turns into gibberish (not Kanji - accented Latin characters and unicode missing-character boxes). I don't think it's a font problem, as it looks ok while I'm inputting it, and when I switch to Japanese, all of the menus show up properly in Japanese. What doesn't work: - Russian input. It mainly shows unicode boxes as well, even if I change the font to Deja Vu (using World -> Open -> File List, loading the TTF, then World -> Appearance -> System fonts -> Code font, then opening a new workspace). Running with or without -encoding UTF-8 -textenc UTF-8 seems to make no difference. Also, oddly, there are many less language options: only German, English, Spanish, French, Japanese and Portuguese. The rest have disappeared. I've prepared a screenshot. The result of typing using a Russian keymap is in the left workspace; the right one shows the result of typing in Japanese with anthy+skim. The top two lines on the right are what happened after pressing enter; the properly-displaying hiragana are what I see while I'm still typing. Do you have any advice for how I can troubleshoot this and/or what I'm doing wrong? [1]: http://etoys.laptop.org/rpms/squeak-vm-3.10-3olpc10.i386.rpm and squeak -version: 3.10-3 #1 Fri Aug 29 19:30:22 CEST 2008 gcc 4.1.2 Squeak3.10beta of 22 July 2007 [latest update: #7159] Linux fedora7 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:35:10 EDT 2008 i686 i686 i386 GNU/Linux default plugin location: /usr/lib/squeak/3.10-3/*.so Regards; Katerina Barone-Adesi bad-input-cropped.png (6K) Download Attachment |
Thank you for continuing to pursue this and especially for sending out
these periodic progress reports. I don't have any particular use for the result at this time, but it is interesting nonetheless and is an issue of importance to many people. Ken On Sat, 2008-09-06 at 16:35 +0200, Katerina Barone-Adesi wrote: > I've tried using the OLPC VM [1] and > http://etoys.laptop.org/src/etoys-dev.zip (with all 239 updates). The > results are improved, but still mixed: > > What works: > - dead keys (as long as I run squeak -vm-display-X11 -compositioninput > etoys-dev.image). > > What half-works: > - Japanese input. I can type with Anthy+Skim and see hiragana > correctly, but as soon as I press space or enter, everything but the > first character turns into gibberish (not Kanji - accented Latin > characters and unicode missing-character boxes). I don't think it's a > font problem, as it looks ok while I'm inputting it, and when I switch > to Japanese, all of the menus show up properly in Japanese. > > What doesn't work: > - Russian input. It mainly shows unicode boxes as well, even if I > change the font to Deja Vu (using World -> Open -> File List, loading > the TTF, then World -> Appearance -> System fonts -> Code font, then > opening a new workspace). > > Running with or without -encoding UTF-8 -textenc UTF-8 seems to make > no difference. > > Also, oddly, there are many less language options: only German, > English, Spanish, French, Japanese and Portuguese. The rest have > disappeared. > > I've prepared a screenshot. The result of typing using a Russian > keymap is in the left workspace; the right one shows the result of > typing in Japanese with anthy+skim. The top two lines on the right > are what happened after pressing enter; the properly-displaying > hiragana are what I see while I'm still typing. > > Do you have any advice for how I can troubleshoot this and/or what I'm > doing wrong? > > [1]: http://etoys.laptop.org/rpms/squeak-vm-3.10-3olpc10.i386.rpm and > squeak -version: > 3.10-3 #1 Fri Aug 29 19:30:22 CEST 2008 gcc 4.1.2 > Squeak3.10beta of 22 July 2007 [latest update: #7159] > Linux fedora7 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:35:10 EDT 2008 > i686 i686 i386 GNU/Linux > default plugin location: /usr/lib/squeak/3.10-3/*.so > > Regards; > Katerina Barone-Adesi signature.asc (196 bytes) Download Attachment |
In reply to this post by Katerina Barone-Adesi
Am 06.09.2008 um 16:35 schrieb Katerina Barone-Adesi: > I've tried using the OLPC VM [1] and > http://etoys.laptop.org/src/etoys-dev.zip (with all 239 updates). The > results are improved, but still mixed: > > What works: > - dead keys (as long as I run squeak -vm-display-X11 -compositioninput > etoys-dev.image). > > What half-works: > - Japanese input. I can type with Anthy+Skim and see hiragana > correctly, but as soon as I press space or enter, everything but the > first character turns into gibberish (not Kanji - accented Latin > characters and unicode missing-character boxes). I don't think it's a > font problem, as it looks ok while I'm inputting it, and when I switch > to Japanese, all of the menus show up properly in Japanese. > > What doesn't work: > - Russian input. It mainly shows unicode boxes as well, Were the (few) translated strings correctly displayed in Cyrillic? E.g., in the Supplies flap the book morph should be labeled "книга". If not, then the fonts on your system are not set up correctly (it works fine on the XO). If the Cyrillic is shown there however then you might have discovered a bug. Possibly there is a clash between unicode- input and latin-5 or so ... > even if I > change the font to Deja Vu (using World -> Open -> File List, loading > the TTF, then World -> Appearance -> System fonts -> Code font, then > opening a new workspace). That is completely independent of Pango rendering. Pango fonts cannot be selected explicitly yet. This stuff is brand new. > Running with or without -encoding UTF-8 -textenc UTF-8 seems to make > no difference. > > Also, oddly, there are many less language options: only German, > English, Spanish, French, Japanese and Portuguese. The rest have > disappeared. That is indeed odd. Apparently it does not find the .mo files in the locale directory next to etoys.image. > I've prepared a screenshot. The result of typing using a Russian > keymap is in the left workspace; the right one shows the result of > typing in Japanese with anthy+skim. The top two lines on the right > are what happened after pressing enter; the properly-displaying > hiragana are what I see while I'm still typing. > > Do you have any advice for how I can troubleshoot this and/or what I'm > doing wrong? Hmm. I'm not sure. The VM version is correct. You might try finding out if the input or the rendering is broken - look at the code point of the generated characters, and tey to construct a string to print made for Cyrillic Unicode values. - Bert - > [1]: http://etoys.laptop.org/rpms/squeak-vm-3.10-3olpc10.i386.rpm and > squeak -version: > 3.10-3 #1 Fri Aug 29 19:30:22 CEST 2008 gcc 4.1.2 > Squeak3.10beta of 22 July 2007 [latest update: #7159] > Linux fedora7 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:35:10 EDT 2008 > i686 i686 i386 GNU/Linux > default plugin location: /usr/lib/squeak/3.10-3/*.so > > Regards; > Katerina Barone-Adesi > <bad-input-cropped.png> |
In reply to this post by Ken Causey-3
> Thank you for continuing to pursue this and especially for sending out
> these periodic progress reports. I don't have any particular use for > the result at this time, but it is interesting nonetheless and is an > issue of importance to many people. Gladly. In turn, I'd like to thank everyone who's worked on Squeak i18n, and everyone who's helped me try to get a working setup. An addendum to my previous email: copy-paste behavior is extremely strange with the OLPC vm and updated OLPC image. I can copy text in Russian or Japanese (for instance, from wikipedia) into Squeak, and it displays perfectly. I can copy the 'gibberish' that I typed and showed in the screenshot in my previous email into other programs, and it displays correctly in the other programs. However, if I try to copy correctly-display Japanese or Russian from Squeak into other programs, it simply doesn't work: the clipboard seems to be empty, and nothing is pasted when I click 'paste'. Kat |
In reply to this post by Bert Freudenberg
>> What doesn't work:
>> - Russian input. It mainly shows unicode boxes as well, > > Were the (few) translated strings correctly displayed in Cyrillic? E.g., in > the Supplies flap the book morph should be labeled "книга". No, because Russian wasn't one of the language options, so I didn't have any translated strings. However, copying text from wikipedia in Russian into the workspace showed it correctly. Japanese was an option, and the translated strings displayed correctly. >> Also, oddly, there are many less language options: only German, >> English, Spanish, French, Japanese and Portuguese. The rest have >> disappeared. > > That is indeed odd. Apparently it does not find the .mo files in the locale > directory next to etoys.image. There are no .mo files (and there is no locale directory) in etoys-dev.zip; it contains QuickGuides/index.pr, QuickGuides/preload-index.sexp.data.gz, Gallery.020.pr, EtoysActivity.004.pr, EtoysV3.sources, Tutorials.009.pr, etoys-dev.image, etoys-dev.changes, and po/etoys/, which contains 29 po files. > Hmm. I'm not sure. The VM version is correct. You might try finding out if > the input or the rendering is broken - look at the code point of the > generated characters, and tey to construct a string to print made for > Cyrillic Unicode values. See my previous email (sent right after you sent this one) - it appears to be input which is broken, because rendering text I copy from other programs and paste into Squeak works correctly. Kat |
Am 06.09.2008 um 22:45 schrieb Katerina Barone-Adesi: >>> What doesn't work: >>> - Russian input. It mainly shows unicode boxes as well, >> >> Were the (few) translated strings correctly displayed in Cyrillic? >> E.g., in >> the Supplies flap the book morph should be labeled "книга". > > No, because Russian wasn't one of the language options, so I didn't > have any translated strings. However, copying text from wikipedia in > Russian into the workspace showed it correctly. Japanese was an > option, and the translated strings displayed correctly. > >>> Also, oddly, there are many less language options: only German, >>> English, Spanish, French, Japanese and Portuguese. The rest have >>> disappeared. >> >> That is indeed odd. Apparently it does not find the .mo files in >> the locale >> directory next to etoys.image. > > There are no .mo files (and there is no locale directory) in > etoys-dev.zip; it contains QuickGuides/index.pr, > QuickGuides/preload-index.sexp.data.gz, Gallery.020.pr, > EtoysActivity.004.pr, EtoysV3.sources, Tutorials.009.pr, > etoys-dev.image, etoys-dev.changes, and po/etoys/, which contains 29 > po files. That is correct, you should get etoys-image-and-pr.zip. >> Hmm. I'm not sure. The VM version is correct. You might try finding >> out if >> the input or the rendering is broken - look at the code point of the >> generated characters, and tey to construct a string to print made for >> Cyrillic Unicode values. > > See my previous email (sent right after you sent this one) - it > appears to be input which is broken, because rendering text I copy > from other programs and paste into Squeak works correctly. Thanks. We have not had any feedback from Russian Linux users yet. I added these two issues to the bug tracker: https://dev.laptop.org/ticket/8339 https://dev.laptop.org/ticket/8340 - Bert - |
In reply to this post by Katerina Barone-Adesi
At Sat, 6 Sep 2008 16:35:38 +0200,
Katerina Barone-Adesi wrote: > > What half-works: > - Japanese input. I can type with Anthy+Skim and see hiragana > correctly, but as soon as I press space or enter, everything but the > first character turns into gibberish (not Kanji - accented Latin > characters and unicode missing-character boxes). I don't think it's a > font problem, as it looks ok while I'm inputting it, and when I switch > to Japanese, all of the menus show up properly in Japanese. For this, I made an image side patch. I can't think of anything that has changed since when I was testing stuff on Fedora 7 (and it worked) and Fedora 9 (I observed the problem you described), so this behavior may be depending on different version of Anthy and/or SCIM. But it seems to be working (and looks logical) with this patch. Please try it and let me know how it goes. -- Yoshiki jaInputDec8-yo.1.cs (2K) Download Attachment |
Free forum by Nabble | Edit this page |