[squeak-dev] Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Michal Perutka-2
Hi all,

I need to type Czech characters in Squeak. I have a latin2 font, so I tried to setup latin2 environment in Squeak as follows:

StrikeFontSet installExternalFontFileName6: 'latin2.out'
    encoding: 14
    encodingName: #Latin2
    textStyleName: #DefaultMultiStyle.

Locale switchToID: (LocaleID isoLanguage: 'cs').

It seems OK, I can see Czech characters with diacritical marks, for example using this in Workspace:
(Character value: 236) asString convertFromEncoding: 'iso-8859-2'

Now when I run Squeak on Ubuntu by this:

LANG="cs_CZ.ISO-8859-2"
LC_ALL="cs_CZ.ISO-8859-2"
squeak

I can type lower case Czech letters ě š č ř ž ý á í é ů ú - the keyboard keys with these letters works. But when I press a key with diacritical mark + some character key, I get only the character followed by a question mark, e? s? c? for example. So I am not able to type Czech upper case characters (like Ě Š Č etc.).

Where is the problem ? In Squeak VM (I use last 3.10-6 version) or in Squeak itself? Please help.

Thanks

Michal


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Andreas.Raab
Michal Perutka wrote:
> I can type lower case Czech letters ě š č ř ž ý á í é ů ú - the keyboard
> keys with these letters works. But when I press a key with diacritical
> mark + some character key, I get only the character followed by a
> question mark, e? s? c? for example. So I am not able to type Czech
> upper case characters (like Ě Š Č etc.).
>
> Where is the problem ? In Squeak VM (I use last 3.10-6 version) or in
> Squeak itself? Please help.

I don't know too much about Linux input handling but it looks like a
mismatch between VM and image (i.e., that the VM is reporting two codes
that the image needs to merge and that the image doesn't really know
what to do with it).

To track this down, you might start by looking at the incoming events in
EventSensor (but VERY carefully; screwing up there is a great recipe for
a force-quit-restart cycle ;) and see if the event codes look reasonable
to you. Also check out the other input converters - some of them might
already be doing what you need.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Michal Perutka-2
2009/8/14 Andreas Raab <[hidden email]>
Michal Perutka wrote:
I can type lower case Czech letters ě š č ř ž ý á í é ů ú - the keyboard keys with these letters works. But when I press a key with diacritical mark + some character key, I get only the character followed by a question mark, e? s? c? for example. So I am not able to type Czech upper case characters (like Ě Š Č etc.).

Where is the problem ? In Squeak VM (I use last 3.10-6 version) or in Squeak itself? Please help.

I don't know too much about Linux input handling but it looks like a mismatch between VM and image (i.e., that the VM is reporting two codes that the image needs to merge and that the image doesn't really know what to do with it).

To track this down, you might start by looking at the incoming events in EventSensor (but VERY carefully; screwing up there is a great recipe for a force-quit-restart cycle ;) and see if the event codes look reasonable to you. Also check out the other input converters - some of them might already be doing what you need.

Cheers,
 - Andreas

Thanks.

So, in EventSensor>>processKeyboardEvent: I inserted  a line
Transcript show: evt asString; show: String cr.
(or I can insert that line in ISO88592InputInterpreter>>nextCharFrom:firstEvt:, the result is the same)

Then, when I type á (=225), I get
#(2 2841355 225 1 0 225 0 0)
#(2 2841355 225 0 0 225 0 0)
#(2 2841506 225 2 0 225 0 0)

When I type acute accent key and then a (=97), first I get
#(2 2862057 180 2 0 0 0 0)

then
#(2 2872015 97 1 0 97 0 0)
#(2 2872015 97 0 0 97 0 0)
#(2 2872015 769 1 0 769 0 0)
#(2 2872015 769 0 0 769 0 0)
#(2 2872191 97 2 0 97 0 0)

and as result I get a?, not á

But what next?

Cheers,
Michal


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Yoshiki Ohshima-2
At Fri, 14 Aug 2009 23:18:11 +0200,
Michal Perutka wrote:

>
> So, in EventSensor>>processKeyboardEvent: I inserted a line
> Transcript show: evt asString; show: String cr.
> (or I can insert that line in ISO88592InputInterpreter>>nextCharFrom:firstEvt:, the result is the same)
>
> Then, when I type ? (=225), I get
> #(2 2841355 225 1 0 225 0 0)
> #(2 2841355 225 0 0 225 0 0)
> #(2 2841506 225 2 0 225 0 0)
>
> When I type acute accent key and then a (=97), first I get
> #(2 2862057 180 2 0 0 0 0)
>
> then
> #(2 2872015 97 1 0 97 0 0)
> #(2 2872015 97 0 0 97 0 0)
> #(2 2872015 769 1 0 769 0 0)
> #(2 2872015 769 0 0 769 0 0)
> #(2 2872191 97 2 0 97 0 0)
>
> and as result I get a?, not ?

  The VM appears to be sending the base character and the compostion
accent character.  Which itself is correct but the image side has to
do something.

  In the Etoys image, there is a class called
UnicodeCompositionStream.  If you stick "97 (= 16r61) and 769 (=
16r301) to that stream, you get out of the accented a.  And in the
Etoys image, the ParagraphEditor uses it to make the composed
character.  It should work ok.

  Alternatively (or along with it), you could turn on the Pango
renderer, which takes non-composed sequence and renders it properly.

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Andreas.Raab
Yoshiki Ohshima wrote:
>   The VM appears to be sending the base character and the compostion
> accent character.  Which itself is correct but the image side has to
> do something.
>
> In the Etoys image, there is a class called
> UnicodeCompositionStream.  If you stick "97 (= 16r61) and 769 (=
> 16r301) to that stream, you get out of the accented a.

Sweet! I was just looking at it, it looks as if the code that generated
the mapping was stripped out. Do you still have it somewhere? Also, is
the rule of combinations complete or does it only cover the common
combination rules?

In any case, this is hugely valuable - I'll check to see how we get this
into Squeak.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Karl Ramberg
On 8/17/09, Andreas Raab <[hidden email]> wrote:

> Yoshiki Ohshima wrote:
>>   The VM appears to be sending the base character and the compostion
>> accent character.  Which itself is correct but the image side has to
>> do something.
>>
>> In the Etoys image, there is a class called
>> UnicodeCompositionStream.  If you stick "97 (= 16r61) and 769 (=
>> 16r301) to that stream, you get out of the accented a.
>
> Sweet! I was just looking at it, it looks as if the code that generated
> the mapping was stripped out. Do you still have it somewhere? Also, is
> the rule of combinations complete or does it only cover the common
> combination rules?
>
> In any case, this is hugely valuable - I'll check to see how we get this
> into Squeak.
>
> Cheers,
>    - Andreas
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Michael van der Gulik-2
In reply to this post by Yoshiki Ohshima-2


On Tue, Aug 18, 2009 at 9:12 AM, Yoshiki Ohshima <[hidden email]> wrote:
At Fri, 14 Aug 2009 23:18:11 +0200,
Michal Perutka wrote:
>
> So, in EventSensor>>processKeyboardEvent: I inserted a line
> Transcript show: evt asString; show: String cr.
> (or I can insert that line in ISO88592InputInterpreter>>nextCharFrom:firstEvt:, the result is the same)
>
> Then, when I type ? (=225), I get
> #(2 2841355 225 1 0 225 0 0)
> #(2 2841355 225 0 0 225 0 0)
> #(2 2841506 225 2 0 225 0 0)
>
> When I type acute accent key and then a (=97), first I get
> #(2 2862057 180 2 0 0 0 0)
>
> then
> #(2 2872015 97 1 0 97 0 0)
> #(2 2872015 97 0 0 97 0 0)
> #(2 2872015 769 1 0 769 0 0)
> #(2 2872015 769 0 0 769 0 0)
> #(2 2872191 97 2 0 97 0 0)
>
> and as result I get a?, not ?

 The VM appears to be sending the base character and the compostion
accent character.  Which itself is correct but the image side has to
do something.


Assuming the Unicode characters 97 ("a") followed by 301 (composing ') in a String, should the correct behaviour be to consider this one character or two?

Given the String 'xxa'xx' (where "a" is Unicode #97 and the middle ' is Unicode #301), would "String at: 3" return a single composed character or uncomposed character?

Or should Unicode-able Strings not be indexable at all to completely circumvent issues like this?

Gulik

--
http://gulik.pbwiki.com/


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Yoshiki Ohshima-2
In reply to this post by Andreas.Raab
At Mon, 17 Aug 2009 14:38:00 -0700,
Andreas Raab wrote:

>
> Yoshiki Ohshima wrote:
> >   The VM appears to be sending the base character and the compostion
> > accent character.  Which itself is correct but the image side has to
> > do something.
> >
> > In the Etoys image, there is a class called
> > UnicodeCompositionStream.  If you stick "97 (= 16r61) and 769 (=
> > 16r301) to that stream, you get out of the accented a.
>
> Sweet! I was just looking at it, it looks as if the code that generated
> the mapping was stripped out. Do you still have it somewhere? Also, is
> the rule of combinations complete or does it only cover the common
> combination rules?

  Hehe, probably proper comments in methods and classes would be a
good idea, as the method comment is wrong and there is nothing tells
you (err, us, really) what to do.

  But here it is.  Download the following:

http://unicode.org/Public/UNIDATA/UnicodeData.txt

put it in the directory with your image.  And then evaluate:

CombinedChar parseCompositionMappingFrom:
   ((FileStream readOnlyFileNamed: 'UnicodeData.txt') wantsLineEndConversion: true)

would do it.  (Actually the resulting dictionaries are bigger than the
ones in the Etoys image.  It hasn't been updated sometime...)

-- Yoshiki


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04

Yoshiki Ohshima-2
In reply to this post by Michael van der Gulik-2
At Tue, 18 Aug 2009 10:12:21 +1200,
Michael van der Gulik wrote:
>
> Assuming the Unicode characters 97 ("a") followed by 301 (composing ') in a String, should the correct behaviour be to
> consider this one character or two?
>
> Given the String 'xxa'xx' (where "a" is Unicode #97 and the middle ' is Unicode #301), would "String at: 3" return a
> single composed character or uncomposed character?
>
> Or should Unicode-able Strings not be indexable at all to completely circumvent issues like this?

  Unicode string can be indexable, but basically don't expect to get a
useful "character" (displayable, comparable, and etc.) always.  What
you get back is a code point, not a character.  For comparison and
other purposes, you need to "normalize" the string first, and result
can be a single composed character or uncomposed character.

  However, when do you need "aString at: 3"?  From the Squeak point of
view, as long as some relationship is satisfied (like #at: agrees with
#size), a random access indexing is rarely needed, and if there is, it
would need some closer attention.

-- Yoshiki


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Linux input testers needed (was: Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04)

Andreas.Raab
In reply to this post by Michal Perutka-2
Folks -

Armed with the input from Yoshiki, here is an attempt at addressing the
problem of decomposed unicode input. I decided to do the handling a
little differently from Etoys by providing a UnicodeInputInterpreter
that does the composition in HandMorph since all the hooks were already
available.

[Yoshiki - out of curiosity, what is the reason why in the Etoys image
this level of input conversion is not managed via the input interpreter
but rather separately in ParagraphEditor?]

I need some people who can test this stuff though. So Michal or anyone
else who does m17n input on Linux, please try the following:
1) Verify that your platform generates non-composed input (see the
original message below)
2) Load the attached code. It will go and fetch the Unicode data and
install the compositions mappings in Unicode.
3) Install the new input converter using:
      World primaryHand keyboardInterpreter: UnicodeInputInterpreter new.
4) Type the same sequence(s) as in 1).

If everything goes as it should, you should see the composed character.
If it doesn't, it would be interesting to see what the state of the
input event queue is at that point in time (print out the contents of
"sensor eventQueue" inside of UnicodeInterpreter>>nextCharFrom:firstEvt:).

Change Set: UnicodeInput-ar
Date: 17 August 2009
Author: Andreas Raab

Simplified handling for decomposed Unicode input.
UnicodeInputInterpreter deals with the composition based on the
composition operations provided by Unicode:
- Unicode>>isComposed: aCharacter
- Unicode>>isComposable: aCharacter
- Unicode>>compose: baseChar with: compositionChar
- Unicode>>decompose: composedChar
See the method comments for more information.

If this works okay for people, I'll push it into the trunk.

Cheers,
   - Andreas



Michal Perutka wrote:

> 2009/8/14 Andreas Raab <[hidden email] <mailto:[hidden email]>>
>
>     Michal Perutka wrote:
>
>         I can type lower case Czech letters ě š č ř ž ý á í é ů ú - the
>         keyboard keys with these letters works. But when I press a key
>         with diacritical mark + some character key, I get only the
>         character followed by a question mark, e? s? c? for example. So
>         I am not able to type Czech upper case characters (like Ě Š Č etc.).
>
>         Where is the problem ? In Squeak VM (I use last 3.10-6 version)
>         or in Squeak itself? Please help.
>
>
>     I don't know too much about Linux input handling but it looks like a
>     mismatch between VM and image (i.e., that the VM is reporting two
>     codes that the image needs to merge and that the image doesn't
>     really know what to do with it).
>
>     To track this down, you might start by looking at the incoming
>     events in EventSensor (but VERY carefully; screwing up there is a
>     great recipe for a force-quit-restart cycle ;) and see if the event
>     codes look reasonable to you. Also check out the other input
>     converters - some of them might already be doing what you need.
>
>     Cheers,
>      - Andreas
>
>
> Thanks.
>
> So, in EventSensor>>processKeyboardEvent: I inserted  a line
> Transcript show: evt asString; show: String cr.
> (or I can insert that line in
> ISO88592InputInterpreter>>nextCharFrom:firstEvt:, the result is the same)
>
> Then, when I type á (=225), I get
> #(2 2841355 225 1 0 225 0 0)
> #(2 2841355 225 0 0 225 0 0)
> #(2 2841506 225 2 0 225 0 0)
>
> When I type acute accent key and then a (=97), first I get
> #(2 2862057 180 2 0 0 0 0)
>
> then
> #(2 2872015 97 1 0 97 0 0)
> #(2 2872015 97 0 0 97 0 0)
> #(2 2872015 769 1 0 769 0 0)
> #(2 2872015 769 0 0 769 0 0)
> #(2 2872191 97 2 0 97 0 0)
>
> and as result I get a?, not á
>
> But what next?
>
> Cheers,
> Michal
>
>
> ------------------------------------------------------------------------
>
>



UnicodeInput-ar.2.cs (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Linux input testers needed (was: Re: Problem with typing Czech characters in Squeak 3.10 on Ubuntu 9.04)

Yoshiki Ohshima-2
At Mon, 17 Aug 2009 22:22:52 -0700,
Andreas Raab wrote:
>
> Folks -
>
> Armed with the input from Yoshiki, here is an attempt at addressing the
> problem of decomposed unicode input.

  Cool!

> I decided to do the handling a
> little differently from Etoys by providing a UnicodeInputInterpreter
> that does the composition in HandMorph since all the hooks were already
> available.
>
> [Yoshiki - out of curiosity, what is the reason why in the Etoys image
> this level of input conversion is not managed via the input interpreter
> but rather separately in ParagraphEditor?]

  One primary case is that when the "multi key" style input is used
(the user holds the "multi" key and hit a key to enter the accent),
the composition char may not come right after the base char; the user
could even move the cursor to a non-accented base char and hit the key
sequence to just enter the composition char.  Another case is where
the user pastes a string that begins with a composition character into
text, and I thought it should combine that with the character before
the paste point when possible.  (This second case is not that
important, I think.)

-- Yoshiki


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Linux input testers needed

Andreas.Raab
Yoshiki Ohshima wrote:
>   One primary case is that when the "multi key" style input is used
> (the user holds the "multi" key and hit a key to enter the accent),
> the composition char may not come right after the base char; the user
> could even move the cursor to a non-accented base char and hit the key
> sequence to just enter the composition char.

So you are basically saying that there are input modes where I can
position the cursor at an arbitrary position in the text and then type a
composition character to modify the character at the input position?
Wow. I had no idea ;-) Where is this used?

I guess that means back to the drawing board, but it'll be interesting
to see if the approach works at all for the case in question.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Linux input testers needed

Yoshiki Ohshima-2
At Mon, 17 Aug 2009 23:03:37 -0700,
Andreas Raab wrote:

>
> Yoshiki Ohshima wrote:
> >   One primary case is that when the "multi key" style input is used
> > (the user holds the "multi" key and hit a key to enter the accent),
> > the composition char may not come right after the base char; the user
> > could even move the cursor to a non-accented base char and hit the key
> > sequence to just enter the composition char.
>
> So you are basically saying that there are input modes where I can
> position the cursor at an arbitrary position in the text and then type a
> composition character to modify the character at the input position?
> Wow. I had no idea ;-) Where is this used?

  Not entirely sure how widely it is used in the world, but the XO
keyboard setting for a country used that style.  I vaguely remember
that somewhat the layout was moved back to the dead-key style input,
but it surely exist.

  Yes, with this, you can also stack many different accent marks on an
arbitrary base character (e.g., you can type "accent-grave-circumflex
b") in XO's Chat program and Write activity and etc quite easily.
With Pango enabled, Etoys can do that.

> I guess that means back to the drawing board, but it'll be interesting
> to see if the approach works at all for the case in question.

  Right.  It was too easy to type a code point sequence with
composition character where no pre-composed form exists.  A more
flexible renderer is ideal, but it is practical for now for Squeak to
only support pre-composed forms and just display ? for unhandled
cases...

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Linux input testers needed

Michal Perutka-2
In reply to this post by Andreas.Raab
Hi,

2009/8/27 Andreas Raab <[hidden email]>
Michal Perutka wrote:
many thanks. I have tried your code (sorry for the delay - I spent two weeks with my family but without my laptop ;) and it works for me (hurray! :-) with these modifications:

Very good.

/Character value: .../ (or /keyValue asCharacter/ in your version of the next method) doesn't work for characters with unicodes > 255 (e.g. "latin small letter s with caron" $š has unicode 353).

Actually, that was no mistake. I meant to use Character value: xxx since I want us to get away from the leading char stuff in Unicode. What happens when you use Character value: instead of Unicode value:? Does it blow up? Does it display incorrectly? Anything else?

$? for every code > 255. For example Character value: 353 shows $?, Character leadingChar: 14 code: 353 shows $š
 


UnicodeInputInterpreter>>nextCharFrom: sensor firstEvt: evtBuf
   "Compose Unicode character sequences"
   | peekEvent keyValue composed |
   "Only try this if the first event is composable and is a character event"
   ((Unicode isComposable: (keyValue := evtBuf *sixth*))
       and:[evtBuf fourth = EventKeyChar]) ifTrue:[ ... ].
   "XXXX: Fixme. We should put the skipped event back if we haven't consumed it."

   ^ *Unicode* value: keyValue

Why evtBuf sixth ? Some keys on a Czech keyboard give me possibility to type Czech characters with diacritical marks directly. Correct codes (unicodes, e.g. 353 for $š) I've found in evtBuf at the sixth position, not at the third. And the sixth position seems to be Ok for all characters.

Yes, that's correct. Mistake on my part. Element three is the old MacRoman value; number six is the UTF32 code point.

And the last thing: When I run Squeak with an option - encoding UTF-8, I think the UnicodeInputInterpreter should be installed. Otherwise I have to do it manually (World primaryHand keyboardInterpreter: UnicodeInputInterpreter new)

Right. I really dislike it that the Unix VM still doesn't use Unicode by default (as the Windows and Mac VMs do). Perhaps I can convince Ian to change the default.
 
Today I tested UnicodeInputInterpreter on Win XP and it works as well as on Linux.

Cheers
Michal



Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Linux input testers needed

Andreas.Raab
Michal Perutka wrote:
> 2009/8/27 Andreas Raab <[hidden email] <mailto:[hidden email]>>
>     Actually, that was no mistake. I meant to use Character value: xxx
>     since I want us to get away from the leading char stuff in Unicode.
>     What happens when you use Character value: instead of Unicode
>     value:? Does it blow up? Does it display incorrectly? Anything else?
>
>
> $? for every code > 255. For example Character value: 353 shows $?,
> Character leadingChar: 14 code: 353 shows $š

That's *extremely* strange. I just tried it and it shows $š in both
cases in a current updated trunk image after selecting a suitable font
(I used Arial). What image are you using? Which font(s) did you use to
try this?

Can someone else try to verify this with a current trunk image? (I'm
wondering if I screwed something up with my own experiments here) You
should see no difference between printing (Character leadingChar: 14
code: 353) and (Character value: 353) (i.e., they either both display as
$? or they both display as $š).

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Linux input testers needed

Michal Perutka-2
2009/8/28 Andreas Raab <[hidden email]>
Michal Perutka wrote:
2009/8/27 Andreas Raab <[hidden email] <mailto:[hidden email]>>

   Actually, that was no mistake. I meant to use Character value: xxx
   since I want us to get away from the leading char stuff in Unicode.
   What happens when you use Character value: instead of Unicode
   value:? Does it blow up? Does it display incorrectly? Anything else?


$? for every code > 255. For example Character value: 353 shows $?, Character leadingChar: 14 code: 353 shows $š

That's *extremely* strange. I just tried it and it shows $š in both cases in a current updated trunk image after selecting a suitable font (I used Arial). What image are you using? Which font(s) did you use to try this?
Can someone else try to verify this with a current trunk image? (I'm wondering if I screwed something up with my own experiments here) You should see no difference between printing (Character leadingChar: 14 code: 353) and (Character value: 353) (i.e., they either both display as $? or they both display as $š).

Sorry, the problem is with my  font - I use a Latin2 bitmap font. When I use some TTF font (installed from Windows), it is OK.

Cheers
Michal


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Linux input testers needed

Michal Perutka-2
In reply to this post by Andreas.Raab
2009/8/27 Andreas Raab <[hidden email]>
Michal Perutka wrote:
UnicodeInputInterpreter>>nextCharFrom: sensor firstEvt: evtBuf
   "Compose Unicode character sequences"
   | peekEvent keyValue composed |
   "Only try this if the first event is composable and is a character event"
   ((Unicode isComposable: (keyValue := evtBuf *sixth*))
       and:[evtBuf fourth = EventKeyChar]) ifTrue:[ ... ].
   "XXXX: Fixme. We should put the skipped event back if we haven't consumed it."

   ^ *Unicode* value: keyValue

Why evtBuf sixth ? Some keys on a Czech keyboard give me possibility to type Czech characters with diacritical marks directly. Correct codes (unicodes, e.g. 353 for $š) I've found in evtBuf at the sixth position, not at the third. And the sixth position seems to be Ok for all characters.

Yes, that's correct. Mistake on my part. Element three is the old MacRoman value; number six is the UTF32 code point.


Using evtBuf sixth throughout the method brought me problems with  ctrl-c, ctrl-s, etc. and even scrolling by a mouse wheel stopped working. This modification seems to fix them:

UnicodeInputInterpreter>>nextCharFrom: sensor firstEvt: evtBuf
    "Compose Unicode character sequences"
    "Only try this if the first event is composable and is a character event"
    | peekEvent keyValue composed |
    keyValue := evtBuf sixth > 127 ifTrue: [evtBuf sixth] ifFalse: [evtBuf third].
    ((Unicode isComposable: keyValue) and: [evtBuf fourth = EventKeyChar])
        ifTrue: [ ...

Cheers
Michal


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Linux input testers needed

Andreas.Raab
In reply to this post by Michal Perutka-2
Michal Perutka wrote:

> 2009/8/28 Andreas Raab <[hidden email] <mailto:[hidden email]>>
>     Can someone else try to verify this with a current trunk image? (I'm
>     wondering if I screwed something up with my own experiments here)
>     You should see no difference between printing (Character
>     leadingChar: 14 code: 353) and (Character value: 353) (i.e., they
>     either both display as $? or they both display as $š).
>
>
> Sorry, the problem is with my  font - I use a Latin2 bitmap font. When I
> use some TTF font (installed from Windows), it is OK.

Phew! Glad we sorted that out ;-)

Cheers,
   - Andreas