Hello
I'm back into exploring and learning about
VisualWorks and Smalltalk after a 3-4 year break, and it's great to see how
things have improved: better documentation, screen casts (excellent!) and
blogs.
I live in Poland and am looking to build a simple
translators' dictionary. Unless I'm mistaken, it appears there is no Polish
locale in VW. I'm just starting this project and would appreciate any advice
regarding using Polish characters in such an application.
Thanks
Nick
|
Dear Nick,
> I'm back into exploring and learning about VisualWorks and Smalltalk > after a 3-4 year break, and it's great to see how things have improved: > I live in Poland and am looking to build a simple translators' dictionary. The Web-TCM translation system may be of interest to you. There is an article on it in the Smalltalk Solutions 2002 report and very brief references to it in the ESUG2001 report and (as 'WebTCM') in the ESUG2007 report and on page 3 in the CSUG2004 report (all these you can find at http://www.esug.org/conferences/niallsreport/). The guest login mentioned in the ESUG2001 report is long out of date but if you are interested you can get more up-to-date info from Anhalt university and/or Georg Heeg. HTH. Yours faithfully Niall Ross |
In reply to this post by Nick Paxford
Hello All,
When I load AllEncodings parcel to a fresh 7.5 image on XP, all the fonts
go small. I'm trying to stream-in a text containing Polish characters; loading
AllEncodings enables me to input Polish in a workspace, but still the
streamed-in text does not display the Polish characters (ł, ż, ź, ć
etc) when I 'show' it on the Transcript window. I think what I'm trying to
do should be possible without creating a Polish Locale (Randy was helpful a
while back and explained how he tweaked an existing installXXLocale method
for Polish although I haven't had any joy with that yet); it's connected with
encoding streams. It seems that my image won't recognize these characters until
I load AllEncodings - does this mean that a standard image doesn't
'contain' such characters? Any pointers, please? Thanks for you patience
:)
Nick
|
Nick Paxford wrote:
> Hello All, > > When I load AllEncodings parcel to a fresh 7.5 image on XP, all the > fonts go small. I'm trying to stream-in a text containing Polish > characters; loading AllEncodings enables me to input Polish in a > workspace, but still the streamed-in text does not display the Polish > characters (ł, ż, ź, ć etc) when I 'show' it on the Transcript window. I > think what I'm trying to do should be possible without creating a Polish > Locale (Randy was helpful a while back and explained how he tweaked an > existing installXXLocale method for Polish although I haven't had any > joy with that yet); it's connected with encoding streams. It seems that > my image won't recognize these characters until I load AllEncodings - > does this mean that a standard image doesn't 'contain' such characters? > Any pointers, please? Thanks for you patience :) There is a number of potential issues involved here. 1) You have to use the right encoding to read the file in. By default, VW will use the your current Locale's encoding to read the file. You can force a different encoding explicitly, e.g. if the file is in iso8859-2. Read it from ('file' asFilename withEncoding: #iso8859_2) readStream. 2) If you have the right characters in memory and they are to be displayed, the text widget needs to turn the characters into indexes of the font glyphs corresponding to the characters. This is again an encoding process. E.g. if you use a font claiming to be iso-8859-2, the characters need to be encoded back to bytes using that encoding and the bytes are used to find the right glyph in the font. Obviously if there is a mismatch between the encoding used to get the bytes and the real encoding of the font, you won't get the right glyphs. The character encoding is controlled by a TextAttributes instance associated with the text widget displaying it. Again there are bunch of defaults that ultimately fall back to the Locale, but you can force arbitrary encoding if you specify the TextAttributes for the text widget explicitly. 3) If you get all the above right it may also happen that the font you end up using may not have all the glyphs for all the valid values of the encoding that it claims to support. This is often the case with the "unicode" fonts, where the range of values is much larger than with the iso8859 set for example. There probably isn't a unicode font that would support all of them, although I'd expect any reasonable "unicode" font to contain the Polish glyphs. The completeness of the font differs significantly between fonts and platforms. So you basically need to find out which font on your platform has the glyphs you need, and make sure that's the one you'll end up using. So, while it is certainly possible to get the characters show properly without building the Locale, building the Locale may well be the easiest way to get there in the end. I used the following expression to enable "unicode" font on Linux (Fedora) with 7.5, to be able to see properly non-Latin 1 characters in our test suites: CharacterEncoder installEncoder: UnicodeCharacterEncoder new named: #'iso10646-1' platform: #unix. CharacterEncoder installEncoder: UnicodeCharacterEncoder new named: #'iso10646-1' platform: #win32. Locale current preferredEncodings: #('iso10646-1'). Locale internalSet: Locale current name Although I've heard this could kill your image on Windows in some cases, so be careful with it. And I should also say that I don't seem to need this anymore in 7.6 (should be out soon), presumably because we added UTF-8 base Locales for Unix platforms, and recent Linuxes seem to default to UTF-8 so Unicode becomes the preferred encoding automatically. HTH, Martin _______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc |
Free forum by Nabble | Edit this page |