Smalltalk › Squeak › Squeak - Dev

Handling keyboard input on unix when locale set to utf-8

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

13 messages Options

Danil Osipchuk-2

Handling keyboard input on unix when locale set to utf-8

Hello, all.

Many modern linux distros now use utf-8 locale as a default setting. Also stock unix VM never seemed to handle unicode keyboard input under this locale. Are there any plans to do it properly or linux users are supposed to fix the VM themselves (by beating heads against of this thread for example: http://www.nabble.com/Unix-UTF8-input-td11050488.html )? May be anyone has got a 'proper' VM already - it is hard to believe that such drawback is not fixed long time ago.

cheers,
Danil

Janko Mivšek

Re: Handling keyboard input on unix when locale set to utf-8

I second Danil question. Is there anyone willing to dig into a that
problem? I am willing to help as much as I can, but I don't know VM nor
Linux internals much. I jumped over C directly to Smalltalk, you know ... :)

Janko

danil osipchuk wrote:

> Hello, all.
>
> Many modern linux distros now use utf-8 locale as a default setting.
> Also stock unix VM never seemed to handle unicode keyboard input under
> this locale. Are there any plans to do it properly or linux users are
> supposed to fix the VM themselves (by beating heads against of this
> thread for example:
> http://www.nabble.com/Unix-UTF8-input-td11050488.html )? May be anyone
> has got a 'proper' VM already - it is hard to believe that such drawback
> is not fixed long time ago.
>
> cheers,
> Danil
>
>
> ------------------------------------------------------------------------
>
>

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Danil Osipchuk-2

Re: Handling keyboard input on unix when locale set to utf-8

Heh, the silence (I hope I don't look too inpatient - this topic is recurring for years now). Janko, let's face the fact that we are losers with a wrong vm-platform/language combinations :)
I suspect that OLPC team somehow addressed the issue (it is linux based, most probably utf-8 locale there). I also see UnixUTF8JPInputInterpreter class in the image - so japanese people also have a solution.

I'm sure that we can do it also, at least by adopting the solution of Martin Kuball mentioned before (although I would prefer the approach taken in recent Mac and windows VMs - to add a unicode point code as the sixth field of event buffer).

But I have another concern now. What will happen to the patch? Will it find its way to the core VM? Is the unix VM being maintained?

cheers,
Danil

2008/1/16, Janko Mivšek < [hidden email]>:

I second Danil question. Is there anyone willing to dig into a that
problem? I am willing to help as much as I can, but I don't know VM nor
Linux internals much. I jumped over C directly to Smalltalk, you know ... :)

Janko

danil osipchuk wrote:

> Hello, all.
>
> Many modern linux distros now use utf-8 locale as a default setting.
> Also stock unix VM never seemed to handle unicode keyboard input under
> this locale. Are there any plans to do it properly or linux users are
> supposed to fix the VM themselves (by beating heads against of this
> thread for example:
> http://www.nabble.com/Unix-UTF8-input-td11050488.html )? May be anyone
> has got a 'proper' VM already - it is hard to believe that such drawback
> is not fixed long time ago.
>
> cheers,
> Danil
>
>
> ------------------------------------------------------------------------
>
>

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Alexander Serkov

Re: Handling keyboard input on unix when locale set to utf-8

Hi folks!

I am quite new to squeak and smalltalk community,
so my solution is just of "It works for me" sort.

Attached patch fixes utf-8 keyboard input,
while clipboard copy paste is still broken.

On 17/01/2008, danil osipchuk <[hidden email]> wrote:

> Heh, the silence (I hope I don't look too inpatient - this topic is
> recurring for years now). Janko, let's face the fact that we are losers with
> a wrong vm-platform/language combinations :)
> I suspect that OLPC team somehow addressed the issue (it is linux based,
> most probably utf-8 locale there). I also see UnixUTF8JPInputInterpreter
> class in the image - so japanese people also have a solution.
>
> I'm sure that we can do it also, at least by adopting the solution of Martin
> Kuball mentioned before (although I would prefer the approach taken in
> recent Mac and windows VMs - to add a unicode point code as the sixth field
> of event buffer).
>
> But I have another concern now. What will happen to the patch? Will it find
> its way to the core VM? Is the unix VM being maintained?
>
> cheers,
> Danil
>
> 2008/1/16, Janko Mivšek < [hidden email]>:
> > I second Danil question. Is there anyone willing to dig into a that
> > problem? I am willing to help as much as I can, but I don't know VM nor
> > Linux internals much. I jumped over C directly to Smalltalk, you know ...
> :)
> >
> > Janko
> >
> > danil osipchuk wrote:
> > > Hello, all.
> > >
> > > Many modern linux distros now use utf-8 locale as a default setting.
> > > Also stock unix VM never seemed to handle unicode keyboard input under
> > > this locale. Are there any plans to do it properly or linux users are
> > > supposed to fix the VM themselves (by beating heads against of this
> > > thread for example:
> > > http://www.nabble.com/Unix-UTF8-input-td11050488.html
> )? May be anyone
> > > has got a 'proper' VM already - it is hard to believe that such drawback
> > > is not fixed long time ago.
> > >
> > > cheers,
> > > Danil
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > >
> >
> > --
> > Janko Mivšek
> > AIDA/Web
> > Smalltalk Web Application Server
> > http://www.aidaweb.si
> >
> >
>
>
>
>
>

--
Best regards Alexander Serkov

squeak-unicode-input.patch (3K) Download Attachment

José Luis Redrejo

Re: Handling keyboard input on unix when locale set to utf-8

Hi Alexander, I've applied your patch and it doesn't work. Now the image doesn't raise the error it did in the past (vm returned code -31 instead of 135 for 'á' when not using UTF or just ignored dead-keys when using it ), but I only see strange characters in the image when trying to type á,é, etc.
I've applied it to current svn branches of olpc or trunk , maybe you're using another revision/version. In that case, please, tell it which one you used.

This is the output of my locales:

LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=es_ES.UTF-8

Best Regards.
José L.

2008/1/17, Alexander Serkov <[hidden email]>:

Hi folks!

I am quite new to squeak and smalltalk community,
so my solution is just of "It works for me" sort.

Attached patch fixes utf-8 keyboard input,
while clipboard copy paste is still broken.

On 17/01/2008, danil osipchuk <[hidden email]> wrote:
> Heh, the silence (I hope I don't look too inpatient - this topic is
> recurring for years now). Janko, let's face the fact that we are losers with
> a wrong vm-platform/language combinations :)
> I suspect that OLPC team somehow addressed the issue (it is linux based,

> most probably utf-8 locale there). I also see UnixUTF8JPInputInterpreter
> class in the image - so japanese people also have a solution.
>
> I'm sure that we can do it also, at least by adopting the solution of Martin
> Kuball mentioned before (although I would prefer the approach taken in
> recent Mac and windows VMs - to add a unicode point code as the sixth field
> of event buffer).
>
> But I have another concern now. What will happen to the patch? Will it find
> its way to the core VM? Is the unix VM being maintained?
>
> cheers,
> Danil
>
> 2008/1/16, Janko Mivšek < [hidden email]>:
> > I second Danil question. Is there anyone willing to dig into a that
> > problem? I am willing to help as much as I can, but I don't know VM nor
> > Linux internals much. I jumped over C directly to Smalltalk, you know ...
> :)
> >
> > Janko
> >
> > danil osipchuk wrote:
> > > Hello, all.
> > >
> > > Many modern linux distros now use utf-8 locale as a default setting.
> > > Also stock unix VM never seemed to handle unicode keyboard input under
> > > this locale. Are there any plans to do it properly or linux users are
> > > supposed to fix the VM themselves (by beating heads against of this
> > > thread for example:
> > > http://www.nabble.com/Unix-UTF8-input-td11050488.html
> )? May be anyone
> > > has got a 'proper' VM already - it is hard to believe that such drawback
> > > is not fixed long time ago.
> > >
> > > cheers,
> > > Danil

> > >
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > >
> >
> > --
> > Janko Mivšek
> > AIDA/Web
> > Smalltalk Web Application Server
> > http://www.aidaweb.si
> >
> >
>
>
>
>
>

--
Best regards Alexander Serkov

Janko Mivšek

Re: Handling keyboard input on unix when locale set to utf-8

In reply to this post by Alexander Serkov

Hi,

Well, we are three already, so a chance that UTF-8 is finally ND
completely adopted in Squeak is a bit bigger :) And I propose that a
patch should be done (complete and well tested) regardless of adoption,
which will come sooner or later.

Such patch will probably break some existing code, but because UTF-8
solve a problem once for ever, I think the existing code should be
adapted by its authors to UTF-8.

Janko

Alexander Serkov wrote:

> Hi folks!
>
> I am quite new to squeak and smalltalk community,
> so my solution is just of "It works for me" sort.
>
> Attached patch fixes utf-8 keyboard input,
> while clipboard copy paste is still broken.
>
> On 17/01/2008, danil osipchuk <[hidden email]> wrote:
>> Heh, the silence (I hope I don't look too inpatient - this topic is
>> recurring for years now). Janko, let's face the fact that we are losers with
>> a wrong vm-platform/language combinations :)
>> I suspect that OLPC team somehow addressed the issue (it is linux based,
>> most probably utf-8 locale there). I also see UnixUTF8JPInputInterpreter
>> class in the image - so japanese people also have a solution.
>>
>> I'm sure that we can do it also, at least by adopting the solution of Martin
>> Kuball mentioned before (although I would prefer the approach taken in
>> recent Mac and windows VMs - to add a unicode point code as the sixth field
>> of event buffer).
>>
>> But I have another concern now. What will happen to the patch? Will it find
>> its way to the core VM? Is the unix VM being maintained?
>>
>> cheers,
>> Danil
>>
>> 2008/1/16, Janko Mivšek < [hidden email]>:
>>> I second Danil question. Is there anyone willing to dig into a that
>>> problem? I am willing to help as much as I can, but I don't know VM nor
>>> Linux internals much. I jumped over C directly to Smalltalk, you know ...
>> :)
>>> Janko
>>>
>>> danil osipchuk wrote:
>>>> Hello, all.
>>>>
>>>> Many modern linux distros now use utf-8 locale as a default setting.
>>>> Also stock unix VM never seemed to handle unicode keyboard input under
>>>> this locale. Are there any plans to do it properly or linux users are
>>>> supposed to fix the VM themselves (by beating heads against of this
>>>> thread for example:
>>>> http://www.nabble.com/Unix-UTF8-input-td11050488.html
>> )? May be anyone
>>>> has got a 'proper' VM already - it is hard to believe that such drawback
>>>> is not fixed long time ago.
>>>>
>>>> cheers,
>>>> Danil
>>>>
>>>>
>>>>
>> ------------------------------------------------------------------------
>>>>
>>> --
>>> Janko Mivšek
>>> AIDA/Web
>>> Smalltalk Web Application Server
>>> http://www.aidaweb.si
>>>
>>>
>>
>>
>>
>>
>
>
>
> ------------------------------------------------------------------------
>
>

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

José Luis Redrejo

Re: Handling keyboard input on unix when locale set to utf-8

2008/1/17, Janko Mivšek <[hidden email]>:

Hi,

Well, we are three already,

I'm sure we are many more: there are also people from french spoken countries in this list using linux and they won't feel very happy when they upgrade their vm's and see that à or î can not be typed anymore. And if you think of some of the countries where OLPC is been used as Brazil... how many users will get annoyed?

so a chance that UTF-8 is finally ND
completely adopted in Squeak is a bit bigger :) And I propose that a
patch should be done (complete and well tested) regardless of adoption,
which will come sooner or later.

right, but who does it?

Such patch will probably break some existing code, but because UTF-8
solve a problem once for ever, I think the existing code should be
adapted by its authors to UTF-8.

Andreas Raab did already it for the windows vm and the world is still going round...

Regards.

Danil Osipchuk-2

Re: Handling keyboard input on unix when locale set to utf-8

Hooray - Ian Piumarta (praise him) has already done the major (if not all) bit of work. Commits at 3-4 months ago in unix branch actually implement the sixth field in event buffer - the ucs4 field. One has only to utilize it in corresponding InputInterpreter (and it will be even portable accross platforms in simple cases and one doesn't care about older images/VMs)

I just needed to check the svn first (and yes, the unix-vm is maintaned - I take back all my rambling)

:) :) :)

so a chance that UTF-8 is finally ND
completely adopted in Squeak is a bit bigger :) And I propose that a
patch should be done (complete and well tested) regardless of adoption,
which will come sooner or later.

right, but who does it?

Such patch will probably break some existing code, but because UTF-8
solve a problem once for ever, I think the existing code should be
adapted by its authors to UTF-8.

Andreas Raab did already it for the windows vm and the world is still going round...

Regards.

Janko Mivšek

Re: Handling keyboard input on unix when locale set to utf-8

Hi Danil,

How are the results of your check? Hardly wait good news :)

And José's question should be answered too: when a patch will be ready,
tested and proven, who will integrate it into mainstream Squeak?

Well, let we provide a patch first, put it into Mantis at
http://bugs.squeak.org, then make a broader test on our dev images, then
on Damien's squeak-dev ones and if all went well, I think a community
will be persuaded enough to accept the patch into a mainstream Squeak.

Janko

danil osipchuk wrote:

>
> Hooray - Ian Piumarta (praise him) has already done the major (if not
> all) bit of work. Commits at 3-4 months ago in unix branch actually
> implement the sixth field in event buffer - the ucs4 field. One has
> only to utilize it in corresponding InputInterpreter (and it will be
> even portable accross platforms in simple cases and one doesn't care
> about older images/VMs)
>
> I just needed to check the svn first (and yes, the unix-vm is maintaned
> - I take back all my rambling)
>
> :) :) :)
>
>
>
> so a chance that UTF-8 is finally ND
> completely adopted in Squeak is a bit bigger :) And I propose
> that a
> patch should be done (complete and well tested) regardless of
> adoption,
> which will come sooner or later.
>
>
>
> right, but who does it?
>
>
>
>
> Such patch will probably break some existing code, but because UTF-8
> solve a problem once for ever, I think the existing code should be
> adapted by its authors to UTF-8.
>
>
>
> Andreas Raab did already it for the windows vm and the world is
> still going round...
>
> Regards.
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
>

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

Danil Osipchuk-2

Re: Handling keyboard input on unix when locale set to utf-8

Janko hello,

I guess we don't need a patch because as I said the VM in unix trunk already has needed functionality.
I'm at work at the moment and using windows, but yesterday's evening I compiled the trunk VM and it does work in the way exactly I expected (the good news your are asking for :) ) on kubuntu with utf-8 locale. I actually entered Russian text and it was shown in panes. I had no time to try copy-pasting and file name listing but the keyboard input is working for sure.

To do this one needs to get fonts (Andrew Tween's and others excellent work makes it trivial for all platforms).
Then you need to create a LanguageEnvironment - there are examples in image. LanguageEnvironment provides keyboard InputInterpeter which in turn implements #nextCharFrom:firstEvt: (see the attached picture with the example which works now for all main VMs - unix, mac and windows :))

After switching to configured language environment (Locale switchToID: (LocaleID isoLanguage: 'ru') ) corresponding character handling is installed.

There are places where current squeak-environment is not ready - shiny Damien's dev-images may present wallback occasionally to you (usually 'out of bound errors' - easily fixable by changing #at: to obvious in debugger #at:ifAbsent: implementation )

2008/1/18, Janko Mivšek <[hidden email]>:

Hi Danil,

How are the results of your check? Hardly wait good news :)

And José's question should be answered too: when a patch will be ready,
tested and proven, who will integrate it into mainstream Squeak?

Also Ian (unix VM mainaner) turned out to be alive and active - I noticed even a coding style recommendation for contributors added to doc section. So it is not a problem. But currently I'm not sure if other changes are needed.

cheers,
Danil

Well, let we provide a patch first, put it into Mantis at
http://bugs.squeak.org, then make a broader test on our dev images, then
  on Damien's squeak-dev ones and if all went well, I think a community
will be persuaded enough to accept the patch into a mainstream Squeak.

Janko

danil osipchuk wrote:

>
> Hooray - Ian Piumarta (praise him) has already done the major (if not
> all) bit of work. Commits at 3-4 months ago in unix branch actually
> implement the sixth field in event buffer - the ucs4 field.  One has
> only to utilize it in corresponding InputInterpreter (and it will be
> even portable accross platforms in simple cases and one doesn't care
> about older images/VMs)
>
> I just needed to check the svn first (and yes, the unix-vm is maintaned
> - I take back all my rambling)
>
> :) :) :)
>
>
>
>         so a chance that UTF-8 is finally ND
>         completely adopted in Squeak is a bit bigger :) And I propose
>         that a

>         patch should be done (complete and well tested) regardless of
>         adoption,
>         which will come sooner or later.
>
>
>
>     right, but who does it?
>
>
>
>
>         Such patch will probably break some existing code, but because UTF-8
>         solve a problem once for ever, I think the existing code should be
>         adapted by its authors to UTF-8.
>
>
>
>     Andreas Raab did already it for the windows vm and the world is
>     still going round...
>
>     Regards.
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
>

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si

interpreter.jpeg (64K) Download Attachment

José Luis Redrejo

Re: Handling keyboard input on unix when locale set to utf-8

2008/1/18, danil osipchuk <[hidden email]>:

Janko hello,

I guess we don't need a patch because as I said the VM in unix trunk already has needed functionality.
I'm at work at the moment and using windows, but yesterday's evening I compiled the trunk VM and it does work in the way exactly I expected (the good news your are asking for :) ) on kubuntu with utf-8 locale. I actually entered Russian text and it was shown in panes. I had no time to try copy-pasting and file name listing but the keyboard input is working for sure.

To do this one needs to get fonts (Andrew Tween's and others excellent work makes it trivial for all platforms).

Do you mean that freefont packages and plugin must be installed or using current Bitstream fonts available in the current image could be used?

Then you need to create a LanguageEnvironment - there are examples in image. LanguageEnvironment provides keyboard InputInterpeter which in turn implements #nextCharFrom:firstEvt: (see the attached picture with the example which works now for all main VMs - unix, mac and windows :))

After switching to configured language environment (Locale switchToID: (LocaleID isoLanguage: 'ru') ) corresponding character handling is installed.

Spanish LanguageEnvironment is already created in the image, and I can see letters as ñ , ó, etc. if I open images that already contain those characters or if I open a file containing it, but I can not type those characters using the keyboard: Trying to type 'á' all I get is '?a'.

So, in brief, I've being testing this everytime a svn changes happens since last september without any success, so, please, could you explain in more details your steps, specially:
- image you used
- fonts you used
- use it without freefont package and freefont plugin

and assure that character with dead keys (accents) work in your keyboard?

Thanks for your info.

Danil Osipchuk-2

Re: Handling keyboard input on unix when locale set to utf-8

Jose hi
(sorry I don't have accents on my keyboard to correctly spell name :))

I'm no expert on topic and going home, but hope I can help. First of all all I assume you have unix VM compiled from current trunk.

To do this one needs to get fonts (Andrew Tween's and others excellent work makes it trivial for all platforms).

Do you mean that freefont packages and plugin must be installed or using current Bitstream fonts available in the current image could be used?

My guess is that stock bitstream fonts are perfectly ok for Spanish (and other latin languages). For languages with glyphs outside of the first 256 symbol table - you have to make an effort.

Then you need to create a LanguageEnvironment - there are examples in image. LanguageEnvironment provides keyboard InputInterpeter which in turn implements #nextCharFrom:firstEvt: (see the attached picture with the example which works now for all main VMs - unix, mac and windows :))

After switching to configured language environment (Locale switchToID: (LocaleID isoLanguage: 'ru') ) corresponding character handling is installed.

Spanish LanguageEnvironment is already created in the image, and I can see letters as ñ , ó, etc. if I open images that already contain those characters or if I open a file containing it, but I can not type those characters using the keyboard: Trying to type 'á' all I get is '?a'.

Ok, looking at the 'es' environment in KnownEnvironments of LanguageEnvironment I see that Latin1Environment corresponds to it. One of the tasks of the subclassed LanguageEnvironment is to provide a method to interpret keyboard events.

For LatinEnvironment it returns MacRomanInputInterpeter unconditionaly.
Typically LanguageEnvironment tries to guess the most fitting InputInterpeter (look at the Japanese one for example), but I would not recommend bother with it for a while.

So, I suspect that if instead of default MacRomanInputInterpeter you use something along lines I have suggested - keyboard input will work. The most fast but dirty and cruel hack is just to copy-paste the following snippet into:

MacRomanInputInterpeter>>#nextCharFrom: sensor firstEvt: evtBuf

^ evtBuf sixth asCharacter

and see what happens.

Or you may apply the changeset in the attachment and to switch locales back and forth:

(Locale switchToID: (LocaleID isoLanguage: 'en').
(Locale switchToID: (LocaleID isoLanguage: 'es')

The idea is that LanguageEnvironment should guess the VM and other conditions and set up the best methods for character conversion. (The change set doesn't do it)

So, in brief, I've being testing this everytime a svn changes happens since last september without any success, so, please, could you explain in more details your steps, specially:
- image you used

3.9 and 3.10 images - both stock and Damien's 'dev'

- fonts you used

Andreew's freetype package (because of Russian - you don't have to, I think)

- use it without freefont package and freefont plugin

There are other ways, but again I guess you don't need them.

and assure that character with dead keys (accents) work in your keyboard?

I don't have them (Russian keyboard) - but I may try to enable different layouts at home when I have linux at hand.

Thanks for your info.

hope this helps

TestKeysUTF8Latin1.1.cs (792 bytes) Download Attachment

Danil Osipchuk-2

Re: Handling keyboard input on unix when locale set to utf-8

There are other ways, but again I guess you don't need them.

and assure that character with dead keys (accents) work in your keyboard?

I don't have them (Russian keyboard) - but I may try to enable different layouts at home when I have linux at hand.

I reached home - dead keys (if I get it right) seem to be splitted into modifier and the key after.
I'm sure this can be handled in the InputInterpeter, but I'm not sure if it is the supposed behaviour.