Smalltalk › Squeak › International › Español (Spanish)

RV: [HACK] Unicode keyboard input and fonts

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

Edgar J. De Cleene

RV: [HACK] Unicode keyboard input and fonts

Reenvio este largo mail por lo siguiente.
A la gente del equipo de los colectivos les he estado contando del problema
de como se leen los archivos de texto externos al Squeak.
Uno de los problemas es el fin de línea que es distinto en cada sistema
operativo (Mac = CR Win = CR LF Unix = LF)
Otro de los problemas es la codificación de los caracteres de acuerdo a
distintos standares y les sugerí adoptar ISO 8859-1 (Latin 1)
Pero fíjense que terribles derivaciones puede llegar a tener algo que parece
tonto.
Algunos de los problemas de ir de 8 bits a 16 bits (para que quienes no usan
el ingles como idioma puedan trabajar) todavía no han sido solucionados.

------ Mensaje reenviado
De: danil osipchuk <[hidden email]>
Responder a: The general-purpose Squeak developers list
<[hidden email]>
Fecha: Mon, 19 Jun 2006 16:29:19 +0300
Para: The general-purpose Squeak developers list
<[hidden email]>
Asunto: Re: [HACK] Unicode keyboard input and fonts

Yoshiki-san,

First let me express gratitude for all work you have done for m17n.
Certainly it is a great work.

>>
>> As for me, stock image and VMs definitely are not enabled for Russian.
>
> Please keep it in mind that "Unicodizing" and "enable language YYY"
> are different issues (not only in Squeak, but in any systems that
> try to deal with them.). Current Squeak is Unicodized, but many
> languages are wanting to be implemented.

Yes, I understand this completely and in fact my complain originates from
an attempt to implement such support for Russian. I succeeded with it but
I'm not happy with things I had to do for it and how I've done them. It
was painful all the way down. I think most of the problems are VM-related.

What I mean when asking if we are fully 'unicodized'? I expected that we
will have a unified approach for text handling in Squeak - all of the text
inside of squeak is, well, in 'squeak' format (as in various converters
idioms). But it seems that we have a hybrid of an old charset-aware and
new unicode implementation and one must be very careful when dealing with
it. I'm not talking about access to external resource, it is obviously
right thing to have multiple charset representations for them.
But how about this one:

CP1251ClipboardInterpreter>>fromSystemClipboard: aString

| result converter |

result := WriteStream on: (String new: aString size).
converter := CP1251TextConverter new.
aString do: [:each |
result nextPut: (converter toSqueak: each macToSqueak) asCharacter.
].

^ result contents.

Note the #macToSqueak in above. One could argue that clipboard is an
'external' resource which happen to be in mac encoding and therefore
deserves special care, but why we actually should ever do things like this
if we are fully unicode compliant? This trick have been copied from
someone's else language-environment, I'd probably never come up with this
on my own. Not all converters for some reason use this macToSqueak
conversion - this is another interesting point to consider.

About Unix VMs. Please correct me if I'm wrong but is not everyone who did
implementation for his language modified VM in one way or another? I've
found that stock VM doesn't work for me (I've tried FreeBSD and Kubuntu) -
when language is Russian, key chars don't find their way into the image
(and I tried all reasonable combinations of command-line switches and
environment variables).
When I'm changing x2sqKey (it is x2sqKeyPlain) to x2sqKeyInput it starts
working:

///in sqUnixX11.c
typedef int (*x2sqKey_t)(XKeyEvent *xevt);

static int x2sqKeyPlain(XKeyEvent *xevt);
static int x2sqKeyInput(XKeyEvent *xevt);

static x2sqKey_t x2sqKey= x2sqKeyInput;

I didn't manage to make copy/paste between Unix-VM and outer world work
(just run out of steam when experimenting with it).

On windows VM I had to modify sqWin32Window.c. When user copies Russian
text in squeak into clipboard text is being corrupted *unless* current
keyboard layout is Russian. This happens because windows doesn't know
anything about locale of the data being copied. So modification in
sqWin32Window.c:

hLocale = GlobalAlloc(GMEM_MOVEABLE | GMEM_DDESHARE, sizeof(DWORD));
pLocale = (DWORD *) GlobalLock(hLocale);
*pLocale = GetUserDefaultLCID();
GlobalUnlock(hLocale);
SetClipboardData(CF_LOCALE, hLocale);

I'm not sure that everyone will be happy with it, but at least it works
for Russian.

> In Squeak, each language requires a few methods get implemented, and
> basically the native speakers need to yell what fonts they want to use
> for their language.
>
> (By definition of Unicode, there is no single font that can make
> everybody happy for Unicode. Not only in Squeak, but in any systems
> that try to deal with them.)

This is an interesting point, because I certainly used to think about
unicode fonts as about something what suites all of the users at once.
I guess that this is where leadingChar comes from (because I still don't
know what leadingChar is needed for and how to use it correctly)?

> For example, what font do you want to use Russian. For performance
> reason, it would be nice that there is a set of bitmap fonts in
> different size that matches the Accuny fonts, and also a TT font for
> some other purposes.

Most of the time I've spent was in the battle with fonts. I did see
TTCFontReader and it was obviously used by you and others, so it must be
useful. It seems that Bert somehow managed to 'hack' it (btw, it was the
word 'hack' in the heading of Bert's original message what triggered my
response, because it is the most hacking activity I was ever been involved
:)). But I didn't manage to do anything with it, so I had to almost
completely dissect TTFontReader and to reassemble it so I could read
ttf-fonts (hence
http://map.squeak.org/accountbyid/2bf29ca7-cb92-4c16-ae18-6b271117a660/packa
ge/2c1a81e1-4e86-40c8-90b5-824adc4263c5).
TTC sub-hierarchies has gone as a result, so my changes again are not
compatible with main distribution. I've seen for at least two times people
asking on the list 'how do I read my Indian or whatever ttf font into the
image' and nobody answered, so I just did it myself.

The net effect of all above is that I've managed to add support for
Russian but I've ended with system which doesn't seem to be compatible
with Japan (for instance) environment both on vm and image levels. It is
certainly may indicate that I misunderstood concepts, but the fact is:
adding support for another language is not just a matter of adding of a
couple of the LanguageEnvoronment derived methods and classes into the
system.

>
> -- Yoshiki
>
>

Danil

------ Fin del mensaje reenviado

_________________________________________________________
Horóscopos, Salud y belleza, Chistes, Consejos de amor:
el contenido más divertido para tu celular está en Yahoo! Móvil.
Obtenelo en http://movil.yahoo.com.ar

correo electrónico a: [hidden email]

correo electrónico a: [hidden email]

Enlaces de Yahoo! Grupos

<*> Para visitar el sitio web del grupo, andá a:
http://ar.groups.yahoo.com/group/squeakRos/

<*> Para cancelar tu suscripción a este grupo, enviá un mensaje a:
[hidden email]

<*> El uso de Yahoo! Grupos está sujeto a las:
http://ar.docs.yahoo.com/info/utos.html