Hi, I wanted to revisit old discussions in 2003/2005/2007 about getting Squeak VM to handle multilingual inputs in Indic context. Indic keyboard input can come through XIM in multiple languages regardless of the locale setting. LANG may be set to en_US.UTF-8 or en_IN.UTF-8. XIM input method engines are used to generate multilingual keystrokes so the app only sees UTF-8 encoded characters, not keys. The current design for passing multiple encodings into the image will not work for m17n. Currently, the logic for keycode and the keychar (i.e. character typed, possibly composed of multiple keycodes) are splattered across the VMs and images. Tying input encoding to locales complicates Indic support. Composition is platform-specific and is best handled in VM. It looks to me that the complications are due to multiplexing of two keyboard input streams here - control codes (buttons) and text codes (Characters). Buttons are used to fire operations while Characters go into text streams. Button codes need to deal with modifiers and codes < 127 but not with m17n, AFAIK. Characters codes don't need modifiers but need to deal with m17n issues. Here is my proposal to move forward without affecting existing deployments: 1. Map key codes into button codes (e.g. OK, Cancel, Cut, Copy, ...) in the VM itself and pass only button codes into the image. I like Lex Spoon's proposal to start with X11 encodings for buttons (keysymdef.h). One of the button codes can be reserved for a soft keyboard. New images can start using these codes and be ready to run on handhelds and tablets too. 2. Henceforth all VMs will encode all char inputs in utf8 except for non-en Latin1 locales. For these locales, latin1 will be used by default and utf8 if -compositioninput is used. The new VM will pass a dummy button on startup to signal input encoding, or we could introduce a new primitive to signal this state. New images can use it to unify clipboard and input interpreters. Old images can be patched to work with new VMs. I am not fully aware of past history in code page issues for latin1 locales, so this is just a strawman. Please do point out gaps in it. Thanks .. Subbu |
On Saturday 16 Oct 2010 5:42:44 pm K. K. Subramaniam wrote: > 2. Henceforth all VMs will encode all char inputs in utf8 except for > non-en Latin1 locales. For these locales, latin1 will be used by default > and utf8 if -compositioninput is used. The new VM will pass a dummy button > on startup to signal input encoding, or we could introduce a new primitive > to signal this state. New images can use it to unify clipboard and input > interpreters. Old images can be patched to work with new VMs. I discovered one more option to bring in m17n without affecting existing deployments. M17n environments use XMODIFIERS to define X11 input methods (XIM). It is set to none for legacy X clients. E.g. $ XMODIFIERS=none squeakvm Therefore, we can modify platforms/unix/vm-display-X11/sqUnixX11.c to check if XMODIFIERS is set to anything other than none. In that case, x2sqKey can parse UTF-8 sequences to UTF32 in utf32code and MacRoman in evt at: 3. Subbu |
Free forum by Nabble | Edit this page |