Hi -
We're looking into localizing our products at Teleplace and I'm looking around for options in this area. If I understand it correctly, Etoys have been localized in many languages. What I'm curious about is what the localization process looks like in practice: * What needs to be done to support localization in Etoys when writing code? I.e., historically all that was needed was a call to #translated, but I'm not sure if that's still the case with the move to "standard" solutions. * What are the tools to create new localizations of Etoys? I.e., what does a translator start with as input and what is created as the result? What tools are used to create one from the other? How do you check for completeness? * How does localized deployment work with Etoys? I.e., what are the options for providing localized downloads vs. downloading all supported locales and switching dynamically upon startup? * Generally speaking, how do people feel about localization in Etoys? Is it considered to work well, or is it considered to be a painful process? Are there any obvious alternatives one should look at? Thanks for any help and pointers you can provide. Cheers, - Andreas |
At Mon, 18 Jan 2010 21:59:55 -0800,
Andreas Raab wrote: > > Hi - > > We're looking into localizing our products at Teleplace and I'm looking > around for options in this area. If I understand it correctly, Etoys > have been localized in many languages. What I'm curious about is what > the localization process looks like in practice: > > * What needs to be done to support localization in Etoys when writing > code? I.e., historically all that was needed was a call to #translated, > but I'm not sure if that's still the case with the move to "standard" > solutions. We used #translated for string literals, and that does the translation at runtime. But the way the system collects these literals is different from the past. GetTextExporter2 scanns the image and collects all senders of #translated, and another selector called #translatedNoop that is attached to a collection of strings. As the name suggests, GetTextExporter2 creates the de-facto standard gettext pot file. (BTW, above and below work were mostly done by Korakurider. Big thanks to him!) > * What are the tools to create new localizations of Etoys? I.e., what > does a translator start with as input and what is created as the result? > What tools are used to create one from the other? How do you check for > completeness? The created .pot file is uploaded to an online translation site that uses Pootle. The volunteers provides the translation and at the release build time, we collect the translation, compile them to the .mo files. GetTextTranslator, a subclass of NaturalLanguageTranslator, opens the .mo file and looks up the translation of given string from it. The upside of this is that the volunteer translators don't have to know the context of how these phrases are used, but just look at the web site and translate them all. And Pootle is what Sugar and OLPC people used so being able to use the same mechanism was a big plus. The downside is that they translate them without the context; so a phrase with multiple meanings may not get translated properly. Also, a phrase is used in different way and not being able to translate it differently is a problem. We thought of several ways to solve this problem... One was to modify these words in the source code (e.g. a phrase like "start" to "start (verb)" and "start (noun)") but it would have resulted in invalidating a lot of volunteer work and having to provide the English translation. If Pootle was flexible, we could have an annotation to each phrase to indicate its use (still would have required source change), but didn't happen. Splitting the phrases into different text domains was another possibility and it is good for other reason (korakurider has the code even) but didn't happen for various reasons. (I thought we started out from 4,000 or such phrases for Etoys, but now it appears to have 27,000 or such. Not sure what it means...) > * How does localized deployment work with Etoys? I.e., what are the > options for providing localized downloads vs. downloading all supported > locales and switching dynamically upon startup? In general, we bundle these .mo files in the single release. Upon startup, GetTextTranslator scans the specified directory for available languages and show them in the menu. (And trys to switch to the system language.) > * Generally speaking, how do people feel about localization in Etoys? Is > it considered to work well, or is it considered to be a painful process? > Are there any obvious alternatives one should look at? This is subjective, but I thought the process overall worked pretty good, given that volunteers all over the place. There were some things that didn't go as nicely as one'd hope. For some languages, even the translation was provided, the work is unfortunately wasted as the text rendering was only available on Unix (on XO, it is kind of okay but still the UI layout for RTL languages is not there). Many languages, where not many Etoys users and translators are not actually using Etoys, I would have to think that the quality of translation may not be optimal. Pootle was quite slow and required a lot of manual intervention and merging huge translation files for Etoys was indeed pain (I don't know much of the behind scene thing here but it took often weeks to get it done.) At the bottom line, I'd think that going through the gettext mechanism, possibly with nicely annotated strings in the code would be okay. One would want to go for the "system resource strings"-ish convention where you only write string IDs in the source code and separate strings from the source code, but it looks just annotations to me. For a company work, the people can edit .po files by some existing editor and there you can embed comment to show the usage. -- Yoshiki |
Hi Yoshiki -
A couple more questions: > The created .pot file is uploaded to an online translation site that > uses Pootle. The volunteers provides the translation and at the > release build time, we collect the translation, compile them to the > .mo files. GetTextTranslator, a subclass of > NaturalLanguageTranslator, opens the .mo file and looks up the > translation of given string from it. Can you point me to the site where the translations are hosted? > Also, a phrase is used in different way and not being able to > translate it differently is a problem. We thought of several ways to > solve this problem... One was to modify these words in the source > code (e.g. a phrase like "start" to "start (verb)" and "start (noun)") > but it would have resulted in invalidating a lot of volunteer work and > having to provide the English translation. If Pootle was flexible, we > could have an annotation to each phrase to indicate its use (still > would have required source change), but didn't happen. Splitting the > phrases into different text domains was another possibility and it is > good for other reason (korakurider has the code even) but didn't > happen for various reasons. I'm not sure what "splitting this into different text domains" means in this context. How much of a problem is the issue of translating phrases differently in practice? Does it happen to pretty much everyone right away or is that an occasional gotcha that people just work around the best they can? > (I thought we started out from 4,000 or such phrases for Etoys, but > now it appears to have 27,000 or such. Not sure what it means...) It's probably just more coverage. >> * How does localized deployment work with Etoys? I.e., what are the >> options for providing localized downloads vs. downloading all supported >> locales and switching dynamically upon startup? > > In general, we bundle these .mo files in the single release. Upon > startup, GetTextTranslator scans the specified directory for available > languages and show them in the menu. (And trys to switch to the > system language.) What are .mo files? How do they relate to .pot files? >> * Generally speaking, how do people feel about localization in Etoys? Is >> it considered to work well, or is it considered to be a painful process? >> Are there any obvious alternatives one should look at? > > This is subjective, but I thought the process overall worked pretty > good, given that volunteers all over the place. That's kind of what I was asking for. Put differently, if you had the choice between the current approach and some alternative, would you drop the current version no questions asked, or would you likely say "you know what, it's worked for us". From the sound of it it's the latter. > At the bottom line, I'd think that going through the gettext > mechanism, possibly with nicely annotated strings in the code would be > okay. One would want to go for the "system resource strings"-ish > convention where you only write string IDs in the source code and > separate strings from the source code, but it looks just annotations > to me. For a company work, the people can edit .po files by some > existing editor and there you can embed comment to show the usage. Thanks this is all very useful info! Cheers, - Andreas |
At Mon, 18 Jan 2010 23:20:17 -0800,
Andreas Raab wrote: > > Hi Yoshiki - > > A couple more questions: > > The created .pot file is uploaded to an online translation site that > > uses Pootle. The volunteers provides the translation and at the > > release build time, we collect the translation, compile them to the > > .mo files. GetTextTranslator, a subclass of > > NaturalLanguageTranslator, opens the .mo file and looks up the > > translation of given string from it. > > Can you point me to the site where the translations are hosted? It is here: http://translate.sugarlabs.org/ > > Also, a phrase is used in different way and not being able to > > translate it differently is a problem. We thought of several ways to > > solve this problem... One was to modify these words in the source > > code (e.g. a phrase like "start" to "start (verb)" and "start (noun)") > > but it would have resulted in invalidating a lot of volunteer work and > > having to provide the English translation. If Pootle was flexible, we > > could have an annotation to each phrase to indicate its use (still > > would have required source change), but didn't happen. Splitting the > > phrases into different text domains was another possibility and it is > > good for other reason (korakurider has the code even) but didn't > > happen for various reasons. > > I'm not sure what "splitting this into different text domains" means in > this context. How much of a problem is the issue of translating phrases > differently in practice? Does it happen to pretty much everyone right > away or is that an occasional gotcha that people just work around the > best they can? The text domain is a feature of gettext. We still stick all phrases into one domain but if there was a good boundary to split the phrases, it would have been useful for easing Pootle server workload, volunteers perception, and implicitly disambiguate some phrases. But this kind of disambiguation was not big, compared to the lack of inflection support it seems. Workarounds included to edit one occurence of 'start' with ' start ' and provide different translations for 'start' and ' start ' etc. but real support for plural (and gender... which is much harder) would have been good. > > (I thought we started out from 4,000 or such phrases for Etoys, but > > now it appears to have 27,000 or such. Not sure what it means...) > > It's probably just more coverage. IIRC, I thought it was around 10,000 or such at one point. Since it's been automatic, the increase from there is not clear to me why... > >> * How does localized deployment work with Etoys? I.e., what are the > >> options for providing localized downloads vs. downloading all supported > >> locales and switching dynamically upon startup? > > > > In general, we bundle these .mo files in the single release. Upon > > startup, GetTextTranslator scans the specified directory for available > > languages and show them in the menu. (And trys to switch to the > > system language.) > > What are .mo files? How do they relate to .pot files? .pot is the template file in text format. A volunteer edits the file with a generic or special editor for editing .pot to make a .po file, which is in the same text format but with translation interleaved. Then a command compiles a .po file to a binary file (.mo file) for faster access. > >> * Generally speaking, how do people feel about localization in Etoys? Is > >> it considered to work well, or is it considered to be a painful process? > >> Are there any obvious alternatives one should look at? > > > > This is subjective, but I thought the process overall worked pretty > > good, given that volunteers all over the place. > > That's kind of what I was asking for. Put differently, if you had the > choice between the current approach and some alternative, would you drop > the current version no questions asked, or would you likely say "you > know what, it's worked for us". From the sound of it it's the > latter. Yes. For Etoys, if we had incoporated korakurider's other attempts to disambiguate phrases earlier it would have been better but without such it provided a usable system. I had some reservations earlier when going to gettext; I didn't see the idea of getting translations from people who don't know Etoys a good one, but the scalability of workflow paid off. Again for Telespace, the translation will be done by people who know the application, but not necessarily people who can open the Squeak System Browser and look at the code. In that setting, going to an external tool makes sense and then gettext is pretty solid. Supporting more gettext features such as plural support etc. would be a plus. > Thanks this is all very useful info! No problem! -- Yoshiki I'm staying up later than usual, as I realized that the script of the Avatar movie is available online. It clears up some of the questions I had... |
Yoshiki Ohshima wrote:
> At Mon, 18 Jan 2010 23:20:17 -0800, Andreas Raab wrote: >> Can you point me to the site where the translations are hosted? > > It is here: > > http://translate.sugarlabs.org/ Thanks! > The text domain is a feature of gettext. We still stick all phrases > into one domain but if there was a good boundary to split the phrases, > it would have been useful for easing Pootle server workload, > volunteers perception, and implicitly disambiguate some phrases. Interesting. I'll have to read up on this (which is fine, this is mostly an information gathering exercise - all these pointers about things you've tried and haven't are incredibly valuable). > But this kind of disambiguation was not big, compared to the lack of > inflection support it seems. Workarounds included to edit one > occurence of 'start' with ' start ' and provide different translations > for 'start' and ' start ' etc. but real support for plural (and > gender... which is much harder) would have been good. But that's weird. I mean it's not like you could translate a single word without context anyway. Aren't you translating entire phrases? If the phrases are reasonably complete this shouldn't cause a significant problem I'd think (yes, there are probably situations where you need to know about the subject matter i.e., 'Start {1}?' may be different depending on whether the argument is the name of a thing or the name of an activity but those should be rare, no?). Can you give an example or two where you found this to be a problem? > Again for Telespace, the translation will be done by people who know > the application, but not necessarily people who can open the Squeak > System Browser and look at the code. In that setting, going to an > external tool makes sense and then gettext is pretty solid. Yes, exactly. The goal is that we can give a handful of files to people who do this for a living, get it back and just ship it. > Supporting more gettext features such as plural support etc. would > be a plus. I'll check it out. > I'm staying up later than usual, as I realized that the script of > the Avatar movie is available online. It clears up some of the > questions I had... Heh, heh. I'd say it gets best director, best movie, and special effects for sure. I found the plot a bit ... predictable (well, it *is* a Hollywood movie) but it was definitely enjoyable and the effects were first class. Long though. Cheers, - Andreas |
At Tue, 19 Jan 2010 00:32:13 -0800,
Andreas Raab wrote: > > > But this kind of disambiguation was not big, compared to the lack of > > inflection support it seems. Workarounds included to edit one > > occurence of 'start' with ' start ' and provide different translations > > for 'start' and ' start ' etc. but real support for plural (and > > gender... which is much harder) would have been good. > > But that's weird. I mean it's not like you could translate a single word > without context anyway. Aren't you translating entire phrases? If the > phrases are reasonably complete this shouldn't cause a significant > problem I'd think (yes, there are probably situations where you need to > know about the subject matter i.e., 'Start {1}?' may be different > depending on whether the argument is the name of a thing or the name of > an activity but those should be rare, no?). Can you give an example or > two where you found this to be a problem? The earliest in the relatively recent discussion starting from: http://lists.laptop.org/pipermail/localization/2008-March/000726.html and http://tracker.squeakland.org/browse/SQ-139 -- Yoshiki > Heh, heh. I'd say it gets best director, best movie, and special effects > for sure. I found the plot a bit ... predictable (well, it *is* a > Hollywood movie) but it was definitely enjoyable and the effects were > first class. Long though. For "a bit ... predictable", I also gathered from reviews that it could be my impression before seeing it (well it is, in fact)... But, all stereotypical stories were already told thousands of years ago after all. I'd say if it is done nicely, nothing wrong with picking up such a "king's road" story... |
In reply to this post by Yoshiki Ohshima-2
On 19.01.2010, at 09:14, Yoshiki Ohshima wrote:
> > At Mon, 18 Jan 2010 23:20:17 -0800, > Andreas Raab wrote: >> >> Hi Yoshiki - >> >> A couple more questions: >>> The created .pot file is uploaded to an online translation site that >>> uses Pootle. The volunteers provides the translation and at the >>> release build time, we collect the translation, compile them to the >>> .mo files. GetTextTranslator, a subclass of >>> NaturalLanguageTranslator, opens the .mo file and looks up the >>> translation of given string from it. >> >> Can you point me to the site where the translations are hosted? > > It is here: > > http://translate.sugarlabs.org/ > >>> Also, a phrase is used in different way and not being able to >>> translate it differently is a problem. We thought of several ways to >>> solve this problem... One was to modify these words in the source >>> code (e.g. a phrase like "start" to "start (verb)" and "start (noun)") >>> but it would have resulted in invalidating a lot of volunteer work and >>> having to provide the English translation. If Pootle was flexible, we >>> could have an annotation to each phrase to indicate its use (still >>> would have required source change), but didn't happen. Splitting the >>> phrases into different text domains was another possibility and it is >>> good for other reason (korakurider has the code even) but didn't >>> happen for various reasons. >> >> I'm not sure what "splitting this into different text domains" means in >> this context. How much of a problem is the issue of translating phrases >> differently in practice? Does it happen to pretty much everyone right >> away or is that an occasional gotcha that people just work around the >> best they can? > > The text domain is a feature of gettext. We still stick all phrases > into one domain but if there was a good boundary to split the phrases, > it would have been useful for easing Pootle server workload, > volunteers perception, and implicitly disambiguate some phrases. > > But this kind of disambiguation was not big, compared to the lack of > inflection support it seems. Workarounds included to edit one > occurence of 'start' with ' start ' and provide different translations > for 'start' and ' start ' etc. but real support for plural (and > gender... which is much harder) would have been good. See here for an example of context use: http://www.gnu.org/software/gettext/manual/gettext.html#Contexts In fact you should skim the whole gettext manual. It's the de-facto standard for localization in the open-source world. >>> (I thought we started out from 4,000 or such phrases for Etoys, but >>> now it appears to have 27,000 or such. Not sure what it means...) >> >> It's probably just more coverage. > > IIRC, I thought it was around 10,000 or such at one point. Since > it's been automatic, the increase from there is not clear to me why... It's even simpler than that - some tools count strings, some words. Etoys currently has 4412 translatable strings with 27454 words in them. >>>> * How does localized deployment work with Etoys? I.e., what are the >>>> options for providing localized downloads vs. downloading all supported >>>> locales and switching dynamically upon startup? >>> >>> In general, we bundle these .mo files in the single release. Upon >>> startup, GetTextTranslator scans the specified directory for available >>> languages and show them in the menu. (And trys to switch to the >>> system language.) >> >> What are .mo files? How do they relate to .pot files? > > .pot is the template file in text format. A volunteer edits the file > with a generic or special editor for editing .pot to make a .po file, > which is in the same text format but with translation interleaved. > Then a command compiles a .po file to a binary file (.mo file) for > faster access. The "msgfmt" program compiles these. For example, it puts a table at the file's beginning for rapid lookup of strings. We don't read the whole file at startup, but load strings on demand. >>>> * Generally speaking, how do people feel about localization in Etoys? Is >>>> it considered to work well, or is it considered to be a painful process? >>>> Are there any obvious alternatives one should look at? >>> >>> This is subjective, but I thought the process overall worked pretty >>> good, given that volunteers all over the place. >> >> That's kind of what I was asking for. Put differently, if you had the >> choice between the current approach and some alternative, would you drop >> the current version no questions asked, or would you likely say "you >> know what, it's worked for us". From the sound of it it's the >> latter. > > Yes. For Etoys, if we had incoporated korakurider's other attempts > to disambiguate phrases earlier it would have been better but without > such it provided a usable system. I had some reservations earlier > when going to gettext; I didn't see the idea of getting translations > from people who don't know Etoys a good one, but the scalability of > workflow paid off. > > Again for Telespace, the translation will be done by people who know > the application, but not necessarily people who can open the Squeak > System Browser and look at the code. In that setting, going to an > external tool makes sense and then gettext is pretty solid. > > Supporting more gettext features such as plural support etc. would > be a plus. > >> Thanks this is all very useful info! > > No problem! Have to agree with Yoshiki, overall it worked pretty well for us. One thing we tried, abandoned, but might re-instantiate again is splitting the single large Etoys pot into several. There already is support for that there - a string is looked for in the domain file named like the Squeak package for the method sending #translated. If it is not found there, it uses the default translation file (in our case, the single Etoys file). This allows per-package translation files and IIRC it did work fine for Hilaire's DrGeoII. It would be very beneficial for Etoys because it would allow translators to prioritize their work. We could have a file with all the tile translations, one with the UI strings easily visible in Etoys, and then "the rest" which covers all Smalltalk tools and hidden dialogs etc. The single file we have is too daunting. - Bert - |
On 19.01.2010, at 10:50, Bert Freudenberg wrote:
> We don't read the whole file at startup, but load strings on demand. Oops, actually we do load the whole file (though on demand), as I was just reminded. If someone needs details on the actual implementation used in Etoys, Korakurider would be the best to ask. - Bert - |
In reply to this post by Bert Freudenberg
On Tuesday 19 January 2010 03:20:30 pm Bert Freudenberg wrote:
> It's even simpler than that - some tools count strings, some words. Etoys > currently has 4412 translatable strings with 27454 words in them. 5k is still a lot of strings to localize in one shot. The problem with coding in English first and then farming the batch of strings for localization is that localization will always lag in both time and quality. Why not allow users to translate strings on the fly using a hot key? If I see a button label or error message in English I should be able to use a hot key (translateIt) to pop out a translator which allows me to edit the translations in that context. If the base string changes, existing translations could be flagged as obsolete but still left in place. Obsolete translations could be displayed in a different style (gray?) or dropped in favor of a current default translation to alert users for the need to retranslate. Even non-programmers can participate as peers during development phase for a simultaneous multilingual release. This is difficult to do in other languages but should be feasible in a dynamic language like Smalltalk. Subbu |
In reply to this post by Bert Freudenberg
At Tue, 19 Jan 2010 10:50:30 +0100,
Bert Freudenberg wrote: > > See here for an example of context use: > > http://www.gnu.org/software/gettext/manual/gettext.html#Contexts > > In fact you should skim the whole gettext manual. It's the de-facto standard for localization in the open-source world. Sure (but sorry for wrong terminlogy on context and domain). I was referring to supporting code in Squeak for these features. We can't translate plurals and singulars nicely yet. -- Yoshiki |
Free forum by Nabble | Edit this page |