Smalltalk › Squeak › Squeak - Dev

Localization for Squeak products

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

10 messages Options

Andreas.Raab

Localization for Squeak products

Hi -

We're looking into localizing our products at Teleplace and I'm looking
around for options in this area. If I understand it correctly, Etoys
have been localized in many languages. What I'm curious about is what
the localization process looks like in practice:

* What needs to be done to support localization in Etoys when writing
code? I.e., historically all that was needed was a call to #translated,
but I'm not sure if that's still the case with the move to "standard"
solutions.

* What are the tools to create new localizations of Etoys? I.e., what
does a translator start with as input and what is created as the result?
What tools are used to create one from the other? How do you check for
completeness?

* How does localized deployment work with Etoys? I.e., what are the
options for providing localized downloads vs. downloading all supported
locales and switching dynamically upon startup?

* Generally speaking, how do people feel about localization in Etoys? Is
it considered to work well, or is it considered to be a painful process?
Are there any obvious alternatives one should look at?

Thanks for any help and pointers you can provide.

Cheers,
- Andreas

Yoshiki Ohshima-2

Re: Localization for Squeak products

At Mon, 18 Jan 2010 21:59:55 -0800,
Andreas Raab wrote:

>
> Hi -
>
> We're looking into localizing our products at Teleplace and I'm looking
> around for options in this area. If I understand it correctly, Etoys
> have been localized in many languages. What I'm curious about is what
> the localization process looks like in practice:
>
> * What needs to be done to support localization in Etoys when writing
> code? I.e., historically all that was needed was a call to #translated,
> but I'm not sure if that's still the case with the move to "standard"
> solutions.

We used #translated for string literals, and that does the
translation at runtime. But the way the system collects these
literals is different from the past. GetTextExporter2 scanns the
image and collects all senders of #translated, and another selector
called #translatedNoop that is attached to a collection of strings.
As the name suggests, GetTextExporter2 creates the de-facto standard
gettext pot file.

(BTW, above and below work were mostly done by Korakurider. Big
thanks to him!)

> * What are the tools to create new localizations of Etoys? I.e., what
> does a translator start with as input and what is created as the result?
> What tools are used to create one from the other? How do you check for
> completeness?

The created .pot file is uploaded to an online translation site that
uses Pootle. The volunteers provides the translation and at the
release build time, we collect the translation, compile them to the
.mo files. GetTextTranslator, a subclass of
NaturalLanguageTranslator, opens the .mo file and looks up the
translation of given string from it.

The upside of this is that the volunteer translators don't have to
know the context of how these phrases are used, but just look at the
web site and translate them all. And Pootle is what Sugar and OLPC
people used so being able to use the same mechanism was a big plus.
The downside is that they translate them without the context; so a
phrase with multiple meanings may not get translated properly.

Also, a phrase is used in different way and not being able to
translate it differently is a problem. We thought of several ways to
solve this problem... One was to modify these words in the source
code (e.g. a phrase like "start" to "start (verb)" and "start (noun)")
but it would have resulted in invalidating a lot of volunteer work and
having to provide the English translation. If Pootle was flexible, we
could have an annotation to each phrase to indicate its use (still
would have required source change), but didn't happen. Splitting the
phrases into different text domains was another possibility and it is
good for other reason (korakurider has the code even) but didn't
happen for various reasons.

(I thought we started out from 4,000 or such phrases for Etoys, but
now it appears to have 27,000 or such. Not sure what it means...)

> * How does localized deployment work with Etoys? I.e., what are the
> options for providing localized downloads vs. downloading all supported
> locales and switching dynamically upon startup?

In general, we bundle these .mo files in the single release. Upon
startup, GetTextTranslator scans the specified directory for available
languages and show them in the menu. (And trys to switch to the
system language.)

> * Generally speaking, how do people feel about localization in Etoys? Is
> it considered to work well, or is it considered to be a painful process?
> Are there any obvious alternatives one should look at?

This is subjective, but I thought the process overall worked pretty
good, given that volunteers all over the place.

There were some things that didn't go as nicely as one'd hope. For
some languages, even the translation was provided, the work is
unfortunately wasted as the text rendering was only available on Unix
(on XO, it is kind of okay but still the UI layout for RTL languages
is not there). Many languages, where not many Etoys users and
translators are not actually using Etoys, I would have to think that
the quality of translation may not be optimal. Pootle was quite slow
and required a lot of manual intervention and merging huge translation
files for Etoys was indeed pain (I don't know much of the behind scene
thing here but it took often weeks to get it done.)

At the bottom line, I'd think that going through the gettext
mechanism, possibly with nicely annotated strings in the code would be
okay. One would want to go for the "system resource strings"-ish
convention where you only write string IDs in the source code and
separate strings from the source code, but it looks just annotations
to me. For a company work, the people can edit .po files by some
existing editor and there you can embed comment to show the usage.

-- Yoshiki

Andreas.Raab

Re: [etoys-dev] Re: Localization for Squeak products

Hi Yoshiki -

A couple more questions:
> The created .pot file is uploaded to an online translation site that
> uses Pootle. The volunteers provides the translation and at the
> release build time, we collect the translation, compile them to the
> .mo files. GetTextTranslator, a subclass of
> NaturalLanguageTranslator, opens the .mo file and looks up the
> translation of given string from it.

Can you point me to the site where the translations are hosted?

> Also, a phrase is used in different way and not being able to
> translate it differently is a problem. We thought of several ways to
> solve this problem... One was to modify these words in the source
> code (e.g. a phrase like "start" to "start (verb)" and "start (noun)")
> but it would have resulted in invalidating a lot of volunteer work and
> having to provide the English translation. If Pootle was flexible, we
> could have an annotation to each phrase to indicate its use (still
> would have required source change), but didn't happen. Splitting the
> phrases into different text domains was another possibility and it is
> good for other reason (korakurider has the code even) but didn't
> happen for various reasons.

I'm not sure what "splitting this into different text domains" means in
this context. How much of a problem is the issue of translating phrases
differently in practice? Does it happen to pretty much everyone right
away or is that an occasional gotcha that people just work around the
best they can?

> (I thought we started out from 4,000 or such phrases for Etoys, but
> now it appears to have 27,000 or such. Not sure what it means...)

It's probably just more coverage.

>> * How does localized deployment work with Etoys? I.e., what are the
>> options for providing localized downloads vs. downloading all supported
>> locales and switching dynamically upon startup?
>
> In general, we bundle these .mo files in the single release. Upon
> startup, GetTextTranslator scans the specified directory for available
> languages and show them in the menu. (And trys to switch to the
> system language.)

What are .mo files? How do they relate to .pot files?

>> * Generally speaking, how do people feel about localization in Etoys? Is
>> it considered to work well, or is it considered to be a painful process?
>> Are there any obvious alternatives one should look at?
>
> This is subjective, but I thought the process overall worked pretty
> good, given that volunteers all over the place.

That's kind of what I was asking for. Put differently, if you had the
choice between the current approach and some alternative, would you drop
the current version no questions asked, or would you likely say "you
know what, it's worked for us". From the sound of it it's the latter.

> At the bottom line, I'd think that going through the gettext
> mechanism, possibly with nicely annotated strings in the code would be
> okay. One would want to go for the "system resource strings"-ish
> convention where you only write string IDs in the source code and
> separate strings from the source code, but it looks just annotations
> to me. For a company work, the people can edit .po files by some
> existing editor and there you can embed comment to show the usage.

Thanks this is all very useful info!

Cheers,
- Andreas

Yoshiki Ohshima-2

Re: [etoys-dev] Re: Localization for Squeak products

At Mon, 18 Jan 2010 23:20:17 -0800,
Andreas Raab wrote:

>
> Hi Yoshiki -
>
> A couple more questions:
> > The created .pot file is uploaded to an online translation site that
> > uses Pootle. The volunteers provides the translation and at the
> > release build time, we collect the translation, compile them to the
> > .mo files. GetTextTranslator, a subclass of
> > NaturalLanguageTranslator, opens the .mo file and looks up the
> > translation of given string from it.
>
> Can you point me to the site where the translations are hosted?

It is here:

http://translate.sugarlabs.org/

> > Also, a phrase is used in different way and not being able to
> > translate it differently is a problem. We thought of several ways to
> > solve this problem... One was to modify these words in the source
> > code (e.g. a phrase like "start" to "start (verb)" and "start (noun)")
> > but it would have resulted in invalidating a lot of volunteer work and
> > having to provide the English translation. If Pootle was flexible, we
> > could have an annotation to each phrase to indicate its use (still
> > would have required source change), but didn't happen. Splitting the
> > phrases into different text domains was another possibility and it is
> > good for other reason (korakurider has the code even) but didn't
> > happen for various reasons.
>
> I'm not sure what "splitting this into different text domains" means in
> this context. How much of a problem is the issue of translating phrases
> differently in practice? Does it happen to pretty much everyone right
> away or is that an occasional gotcha that people just work around the
> best they can?

The text domain is a feature of gettext. We still stick all phrases
into one domain but if there was a good boundary to split the phrases,
it would have been useful for easing Pootle server workload,
volunteers perception, and implicitly disambiguate some phrases.

But this kind of disambiguation was not big, compared to the lack of
inflection support it seems. Workarounds included to edit one
occurence of 'start' with ' start ' and provide different translations
for 'start' and ' start ' etc. but real support for plural (and
gender... which is much harder) would have been good.

> > (I thought we started out from 4,000 or such phrases for Etoys, but
> > now it appears to have 27,000 or such. Not sure what it means...)
>
> It's probably just more coverage.

IIRC, I thought it was around 10,000 or such at one point. Since
it's been automatic, the increase from there is not clear to me why...

> >> * How does localized deployment work with Etoys? I.e., what are the
> >> options for providing localized downloads vs. downloading all supported
> >> locales and switching dynamically upon startup?
> >
> > In general, we bundle these .mo files in the single release. Upon
> > startup, GetTextTranslator scans the specified directory for available
> > languages and show them in the menu. (And trys to switch to the
> > system language.)
>
> What are .mo files? How do they relate to .pot files?

.pot is the template file in text format. A volunteer edits the file
with a generic or special editor for editing .pot to make a .po file,
which is in the same text format but with translation interleaved.
Then a command compiles a .po file to a binary file (.mo file) for
faster access.

> >> * Generally speaking, how do people feel about localization in Etoys? Is
> >> it considered to work well, or is it considered to be a painful process?
> >> Are there any obvious alternatives one should look at?
> >
> > This is subjective, but I thought the process overall worked pretty
> > good, given that volunteers all over the place.
>
> That's kind of what I was asking for. Put differently, if you had the
> choice between the current approach and some alternative, would you drop
> the current version no questions asked, or would you likely say "you
> know what, it's worked for us". From the sound of it it's the
> latter.

Yes. For Etoys, if we had incoporated korakurider's other attempts
to disambiguate phrases earlier it would have been better but without
such it provided a usable system. I had some reservations earlier
when going to gettext; I didn't see the idea of getting translations
from people who don't know Etoys a good one, but the scalability of
workflow paid off.

Again for Telespace, the translation will be done by people who know
the application, but not necessarily people who can open the Squeak
System Browser and look at the code. In that setting, going to an
external tool makes sense and then gettext is pretty solid.

Supporting more gettext features such as plural support etc. would
be a plus.

> Thanks this is all very useful info!

No problem!

-- Yoshiki

I'm staying up later than usual, as I realized that the script of
the Avatar movie is available online. It clears up some of the
questions I had...

Andreas.Raab

Re: [etoys-dev] Re: Localization for Squeak products

Yoshiki Ohshima wrote:
> At Mon, 18 Jan 2010 23:20:17 -0800, Andreas Raab wrote:
>> Can you point me to the site where the translations are hosted?
>
> It is here:
>
> http://translate.sugarlabs.org/

Thanks!

> The text domain is a feature of gettext. We still stick all phrases
> into one domain but if there was a good boundary to split the phrases,
> it would have been useful for easing Pootle server workload,
> volunteers perception, and implicitly disambiguate some phrases.

Interesting. I'll have to read up on this (which is fine, this is mostly
an information gathering exercise - all these pointers about things
you've tried and haven't are incredibly valuable).

> But this kind of disambiguation was not big, compared to the lack of
> inflection support it seems. Workarounds included to edit one
> occurence of 'start' with ' start ' and provide different translations
> for 'start' and ' start ' etc. but real support for plural (and
> gender... which is much harder) would have been good.

But that's weird. I mean it's not like you could translate a single word
without context anyway. Aren't you translating entire phrases? If the
phrases are reasonably complete this shouldn't cause a significant
problem I'd think (yes, there are probably situations where you need to
know about the subject matter i.e., 'Start {1}?' may be different
depending on whether the argument is the name of a thing or the name of
an activity but those should be rare, no?). Can you give an example or
two where you found this to be a problem?

> Again for Telespace, the translation will be done by people who know
> the application, but not necessarily people who can open the Squeak
> System Browser and look at the code. In that setting, going to an
> external tool makes sense and then gettext is pretty solid.

Yes, exactly. The goal is that we can give a handful of files to people
who do this for a living, get it back and just ship it.

> Supporting more gettext features such as plural support etc. would
> be a plus.

I'll check it out.

> I'm staying up later than usual, as I realized that the script of
> the Avatar movie is available online. It clears up some of the
> questions I had...

Heh, heh. I'd say it gets best director, best movie, and special effects
for sure. I found the plot a bit ... predictable (well, it *is* a
Hollywood movie) but it was definitely enjoyable and the effects were
first class. Long though.

Cheers,
- Andreas

Yoshiki Ohshima-2

Re: Re: [etoys-dev] Re: Localization for Squeak products

At Tue, 19 Jan 2010 00:32:13 -0800,
Andreas Raab wrote:

>
> > But this kind of disambiguation was not big, compared to the lack of
> > inflection support it seems. Workarounds included to edit one
> > occurence of 'start' with ' start ' and provide different translations
> > for 'start' and ' start ' etc. but real support for plural (and
> > gender... which is much harder) would have been good.
>
> But that's weird. I mean it's not like you could translate a single word
> without context anyway. Aren't you translating entire phrases? If the
> phrases are reasonably complete this shouldn't cause a significant
> problem I'd think (yes, there are probably situations where you need to
> know about the subject matter i.e., 'Start {1}?' may be different
> depending on whether the argument is the name of a thing or the name of
> an activity but those should be rare, no?). Can you give an example or
> two where you found this to be a problem?

The earliest in the relatively recent discussion starting from:

http://lists.laptop.org/pipermail/localization/2008-March/000726.html

and

http://tracker.squeakland.org/browse/SQ-139

-- Yoshiki

> Heh, heh. I'd say it gets best director, best movie, and special effects
> for sure. I found the plot a bit ... predictable (well, it *is* a
> Hollywood movie) but it was definitely enjoyable and the effects were
> first class. Long though.

For "a bit ... predictable", I also gathered from reviews that it
could be my impression before seeing it (well it is, in fact)... But,
all stereotypical stories were already told thousands of years ago
after all. I'd say if it is done nicely, nothing wrong with picking
up such a "king's road" story...

Bert Freudenberg

Re: [etoys-dev] Re: Localization for Squeak products

In reply to this post by Yoshiki Ohshima-2

On 19.01.2010, at 09:14, Yoshiki Ohshima wrote:

>
> At Mon, 18 Jan 2010 23:20:17 -0800,
> Andreas Raab wrote:
>>
>> Hi Yoshiki -
>>
>> A couple more questions:
>>> The created .pot file is uploaded to an online translation site that
>>> uses Pootle. The volunteers provides the translation and at the
>>> release build time, we collect the translation, compile them to the
>>> .mo files. GetTextTranslator, a subclass of
>>> NaturalLanguageTranslator, opens the .mo file and looks up the
>>> translation of given string from it.
>>
>> Can you point me to the site where the translations are hosted?
>
> It is here:
>
> http://translate.sugarlabs.org/
>
>>> Also, a phrase is used in different way and not being able to
>>> translate it differently is a problem. We thought of several ways to
>>> solve this problem... One was to modify these words in the source
>>> code (e.g. a phrase like "start" to "start (verb)" and "start (noun)")
>>> but it would have resulted in invalidating a lot of volunteer work and
>>> having to provide the English translation. If Pootle was flexible, we
>>> could have an annotation to each phrase to indicate its use (still
>>> would have required source change), but didn't happen. Splitting the
>>> phrases into different text domains was another possibility and it is
>>> good for other reason (korakurider has the code even) but didn't
>>> happen for various reasons.
>>
>> I'm not sure what "splitting this into different text domains" means in
>> this context. How much of a problem is the issue of translating phrases
>> differently in practice? Does it happen to pretty much everyone right
>> away or is that an occasional gotcha that people just work around the
>> best they can?
>
> The text domain is a feature of gettext. We still stick all phrases
> into one domain but if there was a good boundary to split the phrases,
> it would have been useful for easing Pootle server workload,
> volunteers perception, and implicitly disambiguate some phrases.
>
> But this kind of disambiguation was not big, compared to the lack of
> inflection support it seems. Workarounds included to edit one
> occurence of 'start' with ' start ' and provide different translations
> for 'start' and ' start ' etc. but real support for plural (and
> gender... which is much harder) would have been good.

See here for an example of context use:

http://www.gnu.org/software/gettext/manual/gettext.html#Contexts

In fact you should skim the whole gettext manual. It's the de-facto standard for localization in the open-source world.

>>> (I thought we started out from 4,000 or such phrases for Etoys, but
>>> now it appears to have 27,000 or such. Not sure what it means...)
>>
>> It's probably just more coverage.
>
> IIRC, I thought it was around 10,000 or such at one point. Since
> it's been automatic, the increase from there is not clear to me why...

It's even simpler than that - some tools count strings, some words. Etoys currently has 4412 translatable strings with 27454 words in them.

>>>> * How does localized deployment work with Etoys? I.e., what are the
>>>> options for providing localized downloads vs. downloading all supported
>>>> locales and switching dynamically upon startup?
>>>
>>> In general, we bundle these .mo files in the single release. Upon
>>> startup, GetTextTranslator scans the specified directory for available
>>> languages and show them in the menu. (And trys to switch to the
>>> system language.)
>>
>> What are .mo files? How do they relate to .pot files?
>
> .pot is the template file in text format. A volunteer edits the file
> with a generic or special editor for editing .pot to make a .po file,
> which is in the same text format but with translation interleaved.
> Then a command compiles a .po file to a binary file (.mo file) for
> faster access.

The "msgfmt" program compiles these. For example, it puts a table at the file's beginning for rapid lookup of strings. We don't read the whole file at startup, but load strings on demand.

>>>> * Generally speaking, how do people feel about localization in Etoys? Is
>>>> it considered to work well, or is it considered to be a painful process?
>>>> Are there any obvious alternatives one should look at?
>>>
>>> This is subjective, but I thought the process overall worked pretty
>>> good, given that volunteers all over the place.
>>
>> That's kind of what I was asking for. Put differently, if you had the
>> choice between the current approach and some alternative, would you drop
>> the current version no questions asked, or would you likely say "you
>> know what, it's worked for us". From the sound of it it's the
>> latter.
>
> Yes. For Etoys, if we had incoporated korakurider's other attempts
> to disambiguate phrases earlier it would have been better but without
> such it provided a usable system. I had some reservations earlier
> when going to gettext; I didn't see the idea of getting translations
> from people who don't know Etoys a good one, but the scalability of
> workflow paid off.
>
> Again for Telespace, the translation will be done by people who know
> the application, but not necessarily people who can open the Squeak
> System Browser and look at the code. In that setting, going to an
> external tool makes sense and then gettext is pretty solid.
>
> Supporting more gettext features such as plural support etc. would
> be a plus.
>
>> Thanks this is all very useful info!
>
> No problem!

Have to agree with Yoshiki, overall it worked pretty well for us.

One thing we tried, abandoned, but might re-instantiate again is splitting the single large Etoys pot into several. There already is support for that there - a string is looked for in the domain file named like the Squeak package for the method sending #translated. If it is not found there, it uses the default translation file (in our case, the single Etoys file). This allows per-package translation files and IIRC it did work fine for Hilaire's DrGeoII.

It would be very beneficial for Etoys because it would allow translators to prioritize their work. We could have a file with all the tile translations, one with the UI strings easily visible in Etoys, and then "the rest" which covers all Smalltalk tools and hidden dialogs etc. The single file we have is too daunting.

- Bert -

Bert Freudenberg

Re: [etoys-dev] Re: Localization for Squeak products

On 19.01.2010, at 10:50, Bert Freudenberg wrote:
> We don't read the whole file at startup, but load strings on demand.

Oops, actually we do load the whole file (though on demand), as I was just reminded.

If someone needs details on the actual implementation used in Etoys, Korakurider would be the best to ask.

- Bert -

K. K. Subramaniam

Re: [etoys-dev] Re: Localization for Squeak products

In reply to this post by Bert Freudenberg

On Tuesday 19 January 2010 03:20:30 pm Bert Freudenberg wrote:
> It's even simpler than that - some tools count strings, some words. Etoys
> currently has 4412 translatable strings with 27454 words in them.
5k is still a lot of strings to localize in one shot. The problem with coding
in English first and then farming the batch of strings for localization is that
localization will always lag in both time and quality.

Why not allow users to translate strings on the fly using a hot key? If I see a
button label or error message in English I should be able to use a hot key
(translateIt) to pop out a translator which allows me to edit the translations
in that context. If the base string changes, existing translations could be
flagged as obsolete but still left in place. Obsolete translations could be
displayed in a different style (gray?) or dropped in favor of a current default
translation to alert users for the need to retranslate.

Even non-programmers can participate as peers during development phase for a
simultaneous multilingual release.

This is difficult to do in other languages but should be feasible in a dynamic
language like Smalltalk.

Subbu

Yoshiki Ohshima-2

Re: Localization for Squeak products

In reply to this post by Bert Freudenberg

At Tue, 19 Jan 2010 10:50:30 +0100,
Bert Freudenberg wrote:
>
> See here for an example of context use:
>
> http://www.gnu.org/software/gettext/manual/gettext.html#Contexts
>
> In fact you should skim the whole gettext manual. It's the de-facto standard for localization in the open-source world.

Sure (but sorry for wrong terminlogy on context and domain). I was
referring to supporting code in Squeak for these features. We can't
translate plurals and singulars nicely yet.

-- Yoshiki