Smalltalk › Squeak › Squeak VM

Strawman proposal for m17n

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

2 messages Options

K K Subbu

Strawman proposal for m17n

Hi,

I wanted to revisit old discussions in 2003/2005/2007 about getting Squeak VM
to handle multilingual inputs in Indic context. Indic keyboard input can come
through XIM in multiple languages regardless of the locale setting. LANG may
be set to en_US.UTF-8 or en_IN.UTF-8. XIM input method engines are used to
generate multilingual keystrokes so the app only sees UTF-8 encoded
characters, not keys. The current design for passing multiple encodings into
the image will not work for m17n.

Currently, the logic for keycode and the keychar (i.e. character typed,
possibly composed of multiple keycodes) are splattered across the VMs and
images. Tying input encoding to locales complicates Indic support. Composition
is platform-specific and is best handled in VM.

It looks to me that the complications are due to multiplexing of two keyboard
input streams here - control codes (buttons) and text codes (Characters).
Buttons are used to fire operations while Characters go into text streams.
Button codes need to deal with modifiers and codes < 127 but not with m17n,
AFAIK. Characters codes don't need modifiers but need to deal with m17n issues.

Here is my proposal to move forward without affecting existing deployments:

1. Map key codes into button codes (e.g. OK, Cancel, Cut, Copy, ...) in the VM
itself and pass only button codes into the image. I like Lex Spoon's proposal
to start with X11 encodings for buttons (keysymdef.h). One of the button
codes can be reserved for a soft keyboard. New images can start using these
codes and be ready to run on handhelds and tablets too.

2. Henceforth all VMs will encode all char inputs in utf8 except for non-en
Latin1 locales. For these locales, latin1 will be used by default and utf8 if
-compositioninput is used. The new VM will pass a dummy button on startup to
signal input encoding, or we could introduce a new primitive to signal this
state. New images can use it to unify clipboard and input interpreters. Old
images can be patched to work with new VMs.

I am not fully aware of past history in code page issues for latin1 locales,
so this is just a strawman. Please do point out gaps in it.

Thanks .. Subbu

K K Subbu

Re: Strawman proposal for m17n

On Saturday 16 Oct 2010 5:42:44 pm K. K. Subramaniam wrote:
> 2. Henceforth all VMs will encode all char inputs in utf8 except for
> non-en Latin1 locales. For these locales, latin1 will be used by default
> and utf8 if -compositioninput is used. The new VM will pass a dummy button
> on startup to signal input encoding, or we could introduce a new primitive
> to signal this state. New images can use it to unify clipboard and input
> interpreters. Old images can be patched to work with new VMs.
I discovered one more option to bring in m17n without affecting existing
deployments.

M17n environments use XMODIFIERS to define X11 input methods (XIM). It is set
to none for legacy X clients. E.g.
$ XMODIFIERS=none squeakvm

Therefore, we can modify platforms/unix/vm-display-X11/sqUnixX11.c to check if
XMODIFIERS is set to anything other than none. In that case, x2sqKey can parse
UTF-8 sequences to UTF32 in utf32code and MacRoman in evt at: 3.

Subbu