Unix UTF8 input

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Unix UTF8 input

Chris Petsos
 
I've read many reports on how one can add an utf8 input mechanism in the
unix vm of Squeak... tried to do the trick with the -eventenc parameter
but that didn't work in the latest release.
Does anyone have a compiled VM that supports this parameter?
Source code would also be desirable to see what's going on there...

If i am getting this right, ideally we will want for characters to enter
the image in UTF-8 representation and interpret them there...
This should be done by setting
        -eventenc UTF-8

Thanks in advance...

Christos.

Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Martin Kuball
 
Am Sunday 10 June 2007 schrieb Chris Petsos:
> I've read many reports on how one can add an utf8 input mechanism in the
> unix vm of Squeak... tried to do the trick with the -eventenc parameter
> but that didn't work in the latest release.

Well, the last time I looked at the code the unix vm didn't support
multibyte key events. So you won't have any luck with setting the
eventenc. Some time ago I submitted a patch to make this work. It
obviously never made it into the vm. And I can't blame anyone because the
code was bad. I have improved it a bit since than. So if there is interest
and if somebody tells me where to send it I will submit the patch again.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Bert Freudenberg
 

On Jun 10, 2007, at 22:00 , Martin Kuball wrote:

>
> Am Sunday 10 June 2007 schrieb Chris Petsos:
>> I've read many reports on how one can add an utf8 input mechanism  
>> in the
>> unix vm of Squeak... tried to do the trick with the -eventenc  
>> parameter
>> but that didn't work in the latest release.

Never saw that parameter. The unix VM has -encoding (the encoding  
used for communication with the image), -pathenc (encoding of  
filenames on disk) and -textenc (encoding of 8-bit X clipboard data).

> Well, the last time I looked at the code the unix vm didn't support
> multibyte key events. So you won't have any luck with setting the
> eventenc. Some time ago I submitted a patch to make this work. It
> obviously never made it into the vm. And I can't blame anyone  
> because the
> code was bad. I have improved it a bit since than. So if there is  
> interest
> and if somebody tells me where to send it I will submit the patch  
> again.

Does that use the same field as on Mac an Win VMs? If so, sure, send  
it here! UTF-8 isn't really sensible for keyboard events, we use  
UTF-32 instead.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Chris Petsos
 
On Sun, 2007-06-10 at 22:18 +0200, Bert Freudenberg wrote:

>  
> On Jun 10, 2007, at 22:00 , Martin Kuball wrote:
>
> >
> > Am Sunday 10 June 2007 schrieb Chris Petsos:
> >> I've read many reports on how one can add an utf8 input mechanism  
> >> in the
> >> unix vm of Squeak... tried to do the trick with the -eventenc  
> >> parameter
> >> but that didn't work in the latest release.
>
> Never saw that parameter. The unix VM has -encoding (the encoding  
> used for communication with the image), -pathenc (encoding of  
> filenames on disk) and -textenc (encoding of 8-bit X clipboard data).
>

it's a workaround i found while searching...it's an addition to the VM
by John McIntosh.
You can find the needed changes here...

http://lists.squeakfoundation.org/pipermail/vm-dev/2006-March/000491.html

Didn't work though...

> > Well, the last time I looked at the code the unix vm didn't support
> > multibyte key events. So you won't have any luck with setting the
> > eventenc. Some time ago I submitted a patch to make this work. It
> > obviously never made it into the vm. And I can't blame anyone  
> > because the
> > code was bad. I have improved it a bit since than. So if there is  
> > interest
> > and if somebody tells me where to send it I will submit the patch  
> > again.
>
Yes...please Martin...post it to the mail list...

> Does that use the same field as on Mac an Win VMs? If so, sure, send  
> it here! UTF-8 isn't really sensible for keyboard events, we use  
> UTF-32 instead.

Christos.

Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Martin Kuball
 
Am Sunday 10 June 2007 schrieb Chris Petsos:

> > On Jun 10, 2007, at 22:00 , Martin Kuball wrote:
> > > Well, the last time I looked at the code the unix vm didn't support
> > > multibyte key events. So you won't have any luck with setting the
> > > eventenc. Some time ago I submitted a patch to make this work. It
> > > obviously never made it into the vm. And I can't blame anyone
> > > because the
> > > code was bad. I have improved it a bit since than. So if there is
> > > interest
> > > and if somebody tells me where to send it I will submit the patch
> > > again.
>
> Yes...please Martin...post it to the mail list...
Here you go. The patch changes the input event handling method to process
multibyte characters. Basically there are two changes. First: in the event
handler the individual bytes are accumulated and than the recode method is
called. Second: if you use an utf8 locale the utf8 input encoding is set
to UTF8 (see sqUnixCharConv.c).
If you want to use a different encoding you can overide the automatic
choice by using -eventenc parameter. For available encodings see
sqUnixCharConv.c.

The patch has to be applied to platforms/unix/vm-display-X11/sqUnixX11.c.

If you find any bugs or have questions or suggestions for improvement
please tell me.

Martin

sqUnixX11.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Chris Petsos
 
Hi Martin...thanks for posting...

> Here you go. The patch changes the input event handling method to
> process
> multibyte characters. Basically there are two changes. First: in the
> event
> handler the individual bytes are accumulated and than the recode
> method is
> called. Second: if you use an utf8 locale the utf8 input encoding is
> set
> to UTF8 (see sqUnixCharConv.c).
> If you want to use a different encoding you can overide the automatic
> choice by using -eventenc parameter. For available encodings see
> sqUnixCharConv.c.
>
> The patch has to be applied to
> platforms/unix/vm-display-X11/sqUnixX11.c.
>
> If you find any bugs or have questions or suggestions for improvement
> please tell me.
>
I applied your patch successfully but it seems that the VM does not
respond correctly to -eventenc setting, although it is listed in the
help notes... It reacts as if the parameter does not exist...

A previous solution i applied was setting the parameter correctly, so
i'll try to mix them up to see how it goes...

Christos.

Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Martin Kuball
 
Am Tuesday 19 June 2007 schrieb Chris Petsos:

> Hi Martin...thanks for posting...
>
> > Here you go. The patch changes the input event handling method to
> > process
> > multibyte characters. Basically there are two changes. First: in the
> > event
> > handler the individual bytes are accumulated and than the recode
> > method is
> > called. Second: if you use an utf8 locale the utf8 input encoding is
> > set
> > to UTF8 (see sqUnixCharConv.c).
> > If you want to use a different encoding you can overide the automatic
> > choice by using -eventenc parameter. For available encodings see
> > sqUnixCharConv.c.
> >
> > The patch has to be applied to
> > platforms/unix/vm-display-X11/sqUnixX11.c.
> >
> > If you find any bugs or have questions or suggestions for improvement
> > please tell me.
>
> I applied your patch successfully but it seems that the VM does not
> respond correctly to -eventenc setting, although it is listed in the
> help notes... It reacts as if the parameter does not exist...
>
> A previous solution i applied was setting the parameter correctly, so
> i'll try to mix them up to see how it goes...
>
> Christos.

If you tell me exactly what you did and what you want, I will look into it
and fix it.

Martin

Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Chris Petsos
 
On Tue, 2007-06-19 at 19:28 +0200, Martin Kuball wrote:
> If you tell me exactly what you did and what you want, I will look into it
> and fix it.

Oookaaayy...
I applied the patch to Squeak-3.9-8.src.tar.gz .
I compiled the modified VM.
I run my image with this
         /home/Bob/Desktop/Squeak-3.9-9/build/squeak -eventenc UTF-8
~/.npsqueak/SqueakPlugin.image

And i could not get the VM run...it listed the VM options as if the
eventenc parameter does not exist.
Ok, despite that i could set the SQUEAK_EVENTENC environment variable so
that i can setUxXwinEncoding... but i did not get any keyboard events
when my keyboard was turned to greek locale...
Also, i saw somewhere
        charCode= out[0]; /* only single-byte chars for now*/

Does this mean that the VM won't handle multibyte chars?

I'm looking into it too, thanks for the helpful advice and cooperation
Martin...

Christos.

Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Martin Kuball
 
Am Tuesday 19 June 2007 schrieb Chris Petsos:

> On Tue, 2007-06-19 at 19:28 +0200, Martin Kuball wrote:
> > If you tell me exactly what you did and what you want, I will look
> > into it and fix it.
>
> Oookaaayy...
> I applied the patch to Squeak-3.9-8.src.tar.gz .
> I compiled the modified VM.
> I run my image with this
> /home/Bob/Desktop/Squeak-3.9-9/build/squeak -eventenc UTF-8
> ~/.npsqueak/SqueakPlugin.image
>
> And i could not get the VM run...it listed the VM options as if the
> eventenc parameter does not exist.

It should appear under the X11 options paragraph as the last option. What
exactly failed? The VM just quit? Or you could not enter any characters?
By the way there is no need to specify UTF-8 if you have UTF-8 as your
locale.

> Ok, despite that i could set the SQUEAK_EVENTENC environment variable so
> that i can setUxXwinEncoding... but i did not get any keyboard events
> when my keyboard was turned to greek locale...
> Also, i saw somewhere
> charCode= out[0]; /* only single-byte chars for now*/

That's because (as I understand it - at least at the time I wrote that) the
VM still uses macroman internally. But maybe this is not true any more (or
even was not true a year ago). If so we can change this.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Chris Petsos
 
On Thu, 2007-06-21 at 21:58 +0200, Martin Kuball wrote:

> It should appear under the X11 options paragraph as the last option. What
> exactly failed? The VM just quit? Or you could not enter any characters?
> By the way there is no need to specify UTF-8 if you have UTF-8 as your
> locale.

Indeed, it appears under the X11 options but when i set it the VM does not start.
Now...there goes my Linux dumminess...how do i set UTF-8 as my locale?
Is it the
        LANG=el-GR.utf8

>
> > Ok, despite that i could set the SQUEAK_EVENTENC environment variable so
> > that i can setUxXwinEncoding... but i did not get any keyboard events
> > when my keyboard was turned to greek locale...
> > Also, i saw somewhere
> > charCode= out[0]; /* only single-byte chars for now*/
>
> That's because (as I understand it - at least at the time I wrote that) the
> VM still uses macroman internally. But maybe this is not true any more (or
> even was not true a year ago). If so we can change this.

I 've put some keyboard event buffer printing in my input interpreter so
that i can see if keyboard events are generated and with what values...
When i change the keyboard to my locale, no printing to the
Transcript...no events generated at all...
Why is that?

Christos.

Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Martin Kuball
 
Am Friday 22 June 2007 schrieb Chris Petsos:

> On Thu, 2007-06-21 at 21:58 +0200, Martin Kuball wrote:
> > It should appear under the X11 options paragraph as the last option.
> > What exactly failed? The VM just quit? Or you could not enter any
> > characters? By the way there is no need to specify UTF-8 if you have
> > UTF-8 as your locale.
>
> Indeed, it appears under the X11 options but when i set it the VM does
> not start. Now...there goes my Linux dumminess...how do i set UTF-8 as
> my locale? Is it the
> LANG=el-GR.utf8

Yes, that is part of it. Could you send me the output of the locale
command?

What exactly happens if you start the VM with -eventenc. Is there any error
message or does it simply quit?

Now that I'm thinking about this stuff again I believe that I should not
have added the eventenc parameter at all. The encoding of the keyevents
delivered by the X server is determined by the locale setting. We too
should use the locale setting to figure out how to map from the encoding
of the X server to the squeak internal encoding. No user intervention is
required here. It could only do harm.

I will try this and post a new patch.

> > > Ok, despite that i could set the SQUEAK_EVENTENC environment
> > > variable so that i can setUxXwinEncoding... but i did not get any
> > > keyboard events when my keyboard was turned to greek locale...
> > > Also, i saw somewhere
> > > charCode= out[0]; /* only single-byte chars for now*/
> >
> > That's because (as I understand it - at least at the time I wrote
> > that) the VM still uses macroman internally. But maybe this is not
> > true any more (or even was not true a year ago). If so we can change
> > this.
>
> I 've put some keyboard event buffer printing in my input interpreter so
> that i can see if keyboard events are generated and with what values...
> When i change the keyboard to my locale, no printing to the
> Transcript...no events generated at all...
> Why is that?

So you were able to start the VM but you could not enter any characters?

Martin

Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Chris Petsos
 
On Sun, 2007-06-24 at 19:53 +0200, Martin Kuball wrote:
> Yes, that is part of it. Could you send me the output of the locale
> command?
I undid some things so...here you go...

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


> What exactly happens if you start the VM with -eventenc. Is there any error
> message or does it simply quit?

I don't get any error messages, but the VM resopnds as if the -eventenc
parameter does not exist. Meaning that running...

/home/Bob/Desktop/Squeak-3.9-9/build/squeak -eventenc UTF-8
~/.npsqueak/SqueakPlugin.image

is identical to running...

/home/Bob/Desktop/Squeak-3.9-9/build/squeak -edsfrsdf UTF-8
~/.npsqueak/SqueakPlugin.image

> Now that I'm thinking about this stuff again I believe that I should not
> have added the eventenc parameter at all. The encoding of the keyevents
> delivered by the X server is determined by the locale setting. We too
> should use the locale setting to figure out how to map from the encoding
> of the X server to the squeak internal encoding. No user intervention is
> required here. It could only do harm.

I would agree... Another idea floating in the air is to use the xutf8*
set of functions...from what i know this set is going to be integrated
into the OLPC distribution too, so it may become handy...but...one step
at a time.

> > I 've put some keyboard event buffer printing in my input interpreter so
> > that i can see if keyboard events are generated and with what values...
> > When i change the keyboard to my locale, no printing to the
> > Transcript...no events generated at all...
> > Why is that?
>
> So you were able to start the VM but you could not enter any characters?

Yes, i could start the VM without the -eventenc parameter but i could
not enter any non-english chars...

Christos.

Reply | Threaded
Open this post in threaded view
|

Re: Unix UTF8 input

Martin Kuball
 
Am Monday 25 June 2007 schrieb Chris Petsos:

> On Sun, 2007-06-24 at 19:53 +0200, Martin Kuball wrote:
> > Yes, that is part of it. Could you send me the output of the locale
> > command?
>
> I undid some things so...here you go...
>
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=
>
> > What exactly happens if you start the VM with -eventenc. Is there any
> > error message or does it simply quit?
>
> I don't get any error messages, but the VM resopnds as if the -eventenc
> parameter does not exist. Meaning that running...
>
> /home/Bob/Desktop/Squeak-3.9-9/build/squeak -eventenc UTF-8
> ~/.npsqueak/SqueakPlugin.image
>
> is identical to running...
>
> /home/Bob/Desktop/Squeak-3.9-9/build/squeak -edsfrsdf UTF-8
> ~/.npsqueak/SqueakPlugin.image
>
> > Now that I'm thinking about this stuff again I believe that I should
> > not have added the eventenc parameter at all. The encoding of the
> > keyevents delivered by the X server is determined by the locale
> > setting. We too should use the locale setting to figure out how to map
> > from the encoding of the X server to the squeak internal encoding. No
> > user intervention is required here. It could only do harm.
>
> I would agree... Another idea floating in the air is to use the xutf8*
> set of functions...from what i know this set is going to be integrated
> into the OLPC distribution too, so it may become handy...but...one step
> at a time.

Do you know where I can get more information about these xutf8* functions?

> > > I 've put some keyboard event buffer printing in my input
> > > interpreter so that i can see if keyboard events are generated and
> > > with what values... When i change the keyboard to my locale, no
> > > printing to the Transcript...no events generated at all...
> > > Why is that?
> >
> > So you were able to start the VM but you could not enter any
> > characters?
>
> Yes, i could start the VM without the -eventenc parameter but i could
> not enter any non-english chars...

That means the patch did actually not change anything for you, right?
Are you really sure you'r using the right vm? The one you compiled? Sorry I
have to ask but this is strange.

Martin