Smalltalk › Squeak › Squeak - Dev

[squeak-dev] m17n simplification questions

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

Andreas.Raab

[squeak-dev] m17n simplification questions

Hi Yoshiki (and everyone else knowledgeable in m17n) -

I've been looking through some of the m17n stuff to simplify things and
noticed some parts that I really don't know if they're still used or
not. I don't want to remove them if they're used but I want to make sure
we're not carrying dead weight (and some of it seems obsolete):

* HandMorph's CompositionManager: There is ImmAbstractPlatform, ImmWin32
and ImmX11. Are these still in use and functional? Should we continue to
support them?

* LanguageEnvironment converters: Is there any reason to assume that we
will ever need to support any encodings other than UTF8/Unicode for the
VM/image interface? Should we just get rid of all of these different
converter methods and use the UTF8/Unicode conversions directly, i.e.,
instead of:

converter := LanguageEnvironment defaultFileNameConverter.
squeakPathName := vmPathString convertFromWithConverter: converter.

the code becomes:

squeakPathName := vmPathString utf8ToSqueak.

* Converter classes: If the answer to the previous question is that we
use UTF8/Unicode consistently, is there any reason whatsoever to keep
the clipboard or keyboard interpreter classes? (we're talking a *lot* of
classes here; keyboard interpreter has 15 subclasses; clipboard
interpreter 12 etc).

* EncodedCharSet: Are any encodings other than Unicode currently in use?
Do we need to explicitly support domestic CJK encodings given that we
have Unicode + language tag?

Any comments on these issues are greatly appreciated.

Cheers,
- Andreas

Yoshiki Ohshima-2

Re: [squeak-dev] m17n simplification questions

At Tue, 01 Sep 2009 21:53:13 -0700,
Andreas Raab wrote:

>
> Hi Yoshiki (and everyone else knowledgeable in m17n) -
>
> I've been looking through some of the m17n stuff to simplify things and
> noticed some parts that I really don't know if they're still used or
> not. I don't want to remove them if they're used but I want to make sure
> we're not carrying dead weight (and some of it seems obsolete):
>
> * HandMorph's CompositionManager: There is ImmAbstractPlatform, ImmWin32
> and ImmX11. Are these still in use and functional? Should we continue to
> support them?

Yes, and yes. I probably should put the plugin code up somewhere.
The Unix VM supports (or used to, I haven't tried it in the latest).

> * LanguageEnvironment converters: Is there any reason to assume that we
> will ever need to support any encodings other than UTF8/Unicode for the
> VM/image interface? Should we just get rid of all of these different
> converter methods and use the UTF8/Unicode conversions directly, i.e.,
> instead of:
>
> converter := LanguageEnvironment defaultFileNameConverter.
> squeakPathName := vmPathString convertFromWithConverter: converter.
>
> the code becomes:
>
> squeakPathName := vmPathString utf8ToSqueak.

For file names, in general, it is ok by now.

The complication is reading the file names in a zip file. The name
interpretation has to be special. The zip files being created and had
been created use Shift-JIS for the archive members' names (I wonder it
is 8859-1 in Western Europe still?). The #defaultSystemConverter
variant should stay for this purpose.

> * Converter classes: If the answer to the previous question is that we
> use UTF8/Unicode consistently, is there any reason whatsoever to keep
> the clipboard or keyboard interpreter classes? (we're talking a *lot* of
> classes here; keyboard interpreter has 15 subclasses; clipboard
> interpreter 12 etc).

Only reason would be to manage the language tag for some CJK language.

> * EncodedCharSet: Are any encodings other than Unicode currently in use?
> Do we need to explicitly support domestic CJK encodings given that we
> have Unicode + language tag?

- Because Unicode doesn't offer round trip conversion from/to some
of these encodings, one stance Squeak's m17n is alluding to and
some other systems, like Ruby m17n and Gauche Scheme's mechanism
try to do is to allow non-Unicode encoded chars stored in a
similar manner we did with language tag and ensure the input and
output of these strings consistent. I would kind of like to keep
the ability.

- There are even Etoys project created from old days, that use JIS X
0208. If in the future to allow to load them into a possible
Etoys on mainstream Squeak, we probably would rather keep them.

-- Yoshiki

Andreas.Raab

[squeak-dev] Re: m17n simplification questions

Yoshiki Ohshima wrote:
>> * HandMorph's CompositionManager: There is ImmAbstractPlatform, ImmWin32
>> and ImmX11. Are these still in use and functional? Should we continue to
>> support them?
>
> Yes, and yes. I probably should put the plugin code up somewhere.
> The Unix VM supports (or used to, I haven't tried it in the latest).

Thanks. It would be good if we could have the plugins on squeakvm.org.
That makes it easier to verify that this code is present and up-to-date.

>> * LanguageEnvironment converters: Is there any reason to assume that we
>> will ever need to support any encodings other than UTF8/Unicode for the
>> VM/image interface? Should we just get rid of all of these different
>> converter methods and use the UTF8/Unicode conversions directly, i.e.,
>> instead of:
>>
>> converter := LanguageEnvironment defaultFileNameConverter.
>> squeakPathName := vmPathString convertFromWithConverter: converter.
>>
>> the code becomes:
>>
>> squeakPathName := vmPathString utf8ToSqueak.
>
> For file names, in general, it is ok by now.
>
> The complication is reading the file names in a zip file. The name
> interpretation has to be special. The zip files being created and had
> been created use Shift-JIS for the archive members' names (I wonder it
> is 8859-1 in Western Europe still?). The #defaultSystemConverter
> variant should stay for this purpose.

Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to
be used for this. From what I can see in a trunk image, the only
reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using
asVmPathName which on a current UTF-8 enabled VM would always use UTF-8
anyway. Is this currently broken?

>> * Converter classes: If the answer to the previous question is that we
>> use UTF8/Unicode consistently, is there any reason whatsoever to keep
>> the clipboard or keyboard interpreter classes? (we're talking a *lot* of
>> classes here; keyboard interpreter has 15 subclasses; clipboard
>> interpreter 12 etc).
>
> Only reason would be to manage the language tag for some CJK language.

Could we fold this into the UTF8 converter? I.e., if the environment is
not language-neutral, insert the appropriate language tag?

>> * EncodedCharSet: Are any encodings other than Unicode currently in use?
>> Do we need to explicitly support domestic CJK encodings given that we
>> have Unicode + language tag?
>
> - Because Unicode doesn't offer round trip conversion from/to some
> of these encodings, one stance Squeak's m17n is alluding to and
> some other systems, like Ruby m17n and Gauche Scheme's mechanism
> try to do is to allow non-Unicode encoded chars stored in a
> similar manner we did with language tag and ensure the input and
> output of these strings consistent. I would kind of like to keep
> the ability.
>
> - There are even Etoys project created from old days, that use JIS X
> 0208. If in the future to allow to load them into a possible
> Etoys on mainstream Squeak, we probably would rather keep them.

Fair enough. I'll leave it alone.

Thanks for the help!

Cheers,
- Andreas

Yoshiki Ohshima-2

Re: [squeak-dev] Re: m17n simplification questions

At Wed, 02 Sep 2009 21:58:04 -0700,
Andreas Raab wrote:

>
> >> * LanguageEnvironment converters: Is there any reason to assume that we
> >> will ever need to support any encodings other than UTF8/Unicode for the
> >> VM/image interface? Should we just get rid of all of these different
> >> converter methods and use the UTF8/Unicode conversions directly, i.e.,
> >> instead of:
> >>
> >> converter := LanguageEnvironment defaultFileNameConverter.
> >> squeakPathName := vmPathString convertFromWithConverter: converter.
> >>
> >> the code becomes:
> >>
> >> squeakPathName := vmPathString utf8ToSqueak.
> >
> > For file names, in general, it is ok by now.
> >
> > The complication is reading the file names in a zip file. The name
> > interpretation has to be special. The zip files being created and had
> > been created use Shift-JIS for the archive members' names (I wonder it
> > is 8859-1 in Western Europe still?). The #defaultSystemConverter
> > variant should stay for this purpose.
>
> Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to
> be used for this. From what I can see in a trunk image, the only
> reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using
> asVmPathName which on a current UTF-8 enabled VM would always use UTF-8
> anyway. Is this currently broken?

If that is the case, probably. I'll try to check it later.

> >> * Converter classes: If the answer to the previous question is that we
> >> use UTF8/Unicode consistently, is there any reason whatsoever to keep
> >> the clipboard or keyboard interpreter classes? (we're talking a *lot* of
> >> classes here; keyboard interpreter has 15 subclasses; clipboard
> >> interpreter 12 etc).
> >
> > Only reason would be to manage the language tag for some CJK language.
>
> Could we fold this into the UTF8 converter? I.e., if the environment is
> not language-neutral, insert the appropriate language tag?

It can be a feature of UTF-8 converter (Phillipe wouldn't like it,
probably though). I'd make it explicit so that the user of the
converter gets to decide what tag to put.

Thank you!

-- Yoshiki

Yoshiki Ohshima-2

Re: [squeak-dev] Re: m17n simplification questions

At Sun, 06 Sep 2009 12:52:26 -0700,
Yoshiki Ohshima wrote:
>
> > Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to
> > be used for this. From what I can see in a trunk image, the only
> > reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using
> > asVmPathName which on a current UTF-8 enabled VM would always use UTF-8
> > anyway. Is this currently broken?
>
> If that is the case, probably. I'll try to check it later.

Yes, ZipFileMember>>readCentralDirectoryFileHeaderFrom: is wrong to
send asSqueakPathName for it. Like for fileComment variable,
#convertFromSystemString, or its faster variation should be the right
thing.

-- Yoshiki

Philippe Marschall

Re: [squeak-dev] Re: m17n simplification questions

In reply to this post by Yoshiki Ohshima-2

2009/9/6 Yoshiki Ohshima <[hidden email]>:

> At Wed, 02 Sep 2009 21:58:04 -0700,
> Andreas Raab wrote:
>>
>> >> * LanguageEnvironment converters: Is there any reason to assume that we
>> >> will ever need to support any encodings other than UTF8/Unicode for the
>> >> VM/image interface? Should we just get rid of all of these different
>> >> converter methods and use the UTF8/Unicode conversions directly, i.e.,
>> >> instead of:
>> >>
>> >> converter := LanguageEnvironment defaultFileNameConverter.
>> >> squeakPathName := vmPathString convertFromWithConverter: converter.
>> >>
>> >> the code becomes:
>> >>
>> >> squeakPathName := vmPathString utf8ToSqueak.
>> >
>> > For file names, in general, it is ok by now.
>> >
>> > The complication is reading the file names in a zip file. The name
>> > interpretation has to be special. The zip files being created and had
>> > been created use Shift-JIS for the archive members' names (I wonder it
>> > is 8859-1 in Western Europe still?). The #defaultSystemConverter
>> > variant should stay for this purpose.
>>
>> Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to
>> be used for this. From what I can see in a trunk image, the only
>> reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using
>> asVmPathName which on a current UTF-8 enabled VM would always use UTF-8
>> anyway. Is this currently broken?
>
> If that is the case, probably. I'll try to check it later.
>
>> >> * Converter classes: If the answer to the previous question is that we
>> >> use UTF8/Unicode consistently, is there any reason whatsoever to keep
>> >> the clipboard or keyboard interpreter classes? (we're talking a *lot* of
>> >> classes here; keyboard interpreter has 15 subclasses; clipboard
>> >> interpreter 12 etc).
>> >
>> > Only reason would be to manage the language tag for some CJK language.
>>
>> Could we fold this into the UTF8 converter? I.e., if the environment is
>> not language-neutral, insert the appropriate language tag?
>
> It can be a feature of UTF-8 converter (Phillipe wouldn't like it,
> probably though).

Indeed he doesn't. But he uses a modified fast-path from Andreas anyway.

Cheers
Philippe