Hi Yoshiki (and everyone else knowledgeable in m17n) -
I've been looking through some of the m17n stuff to simplify things and noticed some parts that I really don't know if they're still used or not. I don't want to remove them if they're used but I want to make sure we're not carrying dead weight (and some of it seems obsolete): * HandMorph's CompositionManager: There is ImmAbstractPlatform, ImmWin32 and ImmX11. Are these still in use and functional? Should we continue to support them? * LanguageEnvironment converters: Is there any reason to assume that we will ever need to support any encodings other than UTF8/Unicode for the VM/image interface? Should we just get rid of all of these different converter methods and use the UTF8/Unicode conversions directly, i.e., instead of: converter := LanguageEnvironment defaultFileNameConverter. squeakPathName := vmPathString convertFromWithConverter: converter. the code becomes: squeakPathName := vmPathString utf8ToSqueak. * Converter classes: If the answer to the previous question is that we use UTF8/Unicode consistently, is there any reason whatsoever to keep the clipboard or keyboard interpreter classes? (we're talking a *lot* of classes here; keyboard interpreter has 15 subclasses; clipboard interpreter 12 etc). * EncodedCharSet: Are any encodings other than Unicode currently in use? Do we need to explicitly support domestic CJK encodings given that we have Unicode + language tag? Any comments on these issues are greatly appreciated. Cheers, - Andreas |
At Tue, 01 Sep 2009 21:53:13 -0700,
Andreas Raab wrote: > > Hi Yoshiki (and everyone else knowledgeable in m17n) - > > I've been looking through some of the m17n stuff to simplify things and > noticed some parts that I really don't know if they're still used or > not. I don't want to remove them if they're used but I want to make sure > we're not carrying dead weight (and some of it seems obsolete): > > * HandMorph's CompositionManager: There is ImmAbstractPlatform, ImmWin32 > and ImmX11. Are these still in use and functional? Should we continue to > support them? Yes, and yes. I probably should put the plugin code up somewhere. The Unix VM supports (or used to, I haven't tried it in the latest). > * LanguageEnvironment converters: Is there any reason to assume that we > will ever need to support any encodings other than UTF8/Unicode for the > VM/image interface? Should we just get rid of all of these different > converter methods and use the UTF8/Unicode conversions directly, i.e., > instead of: > > converter := LanguageEnvironment defaultFileNameConverter. > squeakPathName := vmPathString convertFromWithConverter: converter. > > the code becomes: > > squeakPathName := vmPathString utf8ToSqueak. For file names, in general, it is ok by now. The complication is reading the file names in a zip file. The name interpretation has to be special. The zip files being created and had been created use Shift-JIS for the archive members' names (I wonder it is 8859-1 in Western Europe still?). The #defaultSystemConverter variant should stay for this purpose. > * Converter classes: If the answer to the previous question is that we > use UTF8/Unicode consistently, is there any reason whatsoever to keep > the clipboard or keyboard interpreter classes? (we're talking a *lot* of > classes here; keyboard interpreter has 15 subclasses; clipboard > interpreter 12 etc). Only reason would be to manage the language tag for some CJK language. > * EncodedCharSet: Are any encodings other than Unicode currently in use? > Do we need to explicitly support domestic CJK encodings given that we > have Unicode + language tag? - Because Unicode doesn't offer round trip conversion from/to some of these encodings, one stance Squeak's m17n is alluding to and some other systems, like Ruby m17n and Gauche Scheme's mechanism try to do is to allow non-Unicode encoded chars stored in a similar manner we did with language tag and ensure the input and output of these strings consistent. I would kind of like to keep the ability. - There are even Etoys project created from old days, that use JIS X 0208. If in the future to allow to load them into a possible Etoys on mainstream Squeak, we probably would rather keep them. -- Yoshiki |
Yoshiki Ohshima wrote:
>> * HandMorph's CompositionManager: There is ImmAbstractPlatform, ImmWin32 >> and ImmX11. Are these still in use and functional? Should we continue to >> support them? > > Yes, and yes. I probably should put the plugin code up somewhere. > The Unix VM supports (or used to, I haven't tried it in the latest). Thanks. It would be good if we could have the plugins on squeakvm.org. That makes it easier to verify that this code is present and up-to-date. >> * LanguageEnvironment converters: Is there any reason to assume that we >> will ever need to support any encodings other than UTF8/Unicode for the >> VM/image interface? Should we just get rid of all of these different >> converter methods and use the UTF8/Unicode conversions directly, i.e., >> instead of: >> >> converter := LanguageEnvironment defaultFileNameConverter. >> squeakPathName := vmPathString convertFromWithConverter: converter. >> >> the code becomes: >> >> squeakPathName := vmPathString utf8ToSqueak. > > For file names, in general, it is ok by now. > > The complication is reading the file names in a zip file. The name > interpretation has to be special. The zip files being created and had > been created use Shift-JIS for the archive members' names (I wonder it > is 8859-1 in Western Europe still?). The #defaultSystemConverter > variant should stay for this purpose. Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to be used for this. From what I can see in a trunk image, the only reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using asVmPathName which on a current UTF-8 enabled VM would always use UTF-8 anyway. Is this currently broken? >> * Converter classes: If the answer to the previous question is that we >> use UTF8/Unicode consistently, is there any reason whatsoever to keep >> the clipboard or keyboard interpreter classes? (we're talking a *lot* of >> classes here; keyboard interpreter has 15 subclasses; clipboard >> interpreter 12 etc). > > Only reason would be to manage the language tag for some CJK language. Could we fold this into the UTF8 converter? I.e., if the environment is not language-neutral, insert the appropriate language tag? >> * EncodedCharSet: Are any encodings other than Unicode currently in use? >> Do we need to explicitly support domestic CJK encodings given that we >> have Unicode + language tag? > > - Because Unicode doesn't offer round trip conversion from/to some > of these encodings, one stance Squeak's m17n is alluding to and > some other systems, like Ruby m17n and Gauche Scheme's mechanism > try to do is to allow non-Unicode encoded chars stored in a > similar manner we did with language tag and ensure the input and > output of these strings consistent. I would kind of like to keep > the ability. > > - There are even Etoys project created from old days, that use JIS X > 0208. If in the future to allow to load them into a possible > Etoys on mainstream Squeak, we probably would rather keep them. Fair enough. I'll leave it alone. Thanks for the help! Cheers, - Andreas |
At Wed, 02 Sep 2009 21:58:04 -0700,
Andreas Raab wrote: > > >> * LanguageEnvironment converters: Is there any reason to assume that we > >> will ever need to support any encodings other than UTF8/Unicode for the > >> VM/image interface? Should we just get rid of all of these different > >> converter methods and use the UTF8/Unicode conversions directly, i.e., > >> instead of: > >> > >> converter := LanguageEnvironment defaultFileNameConverter. > >> squeakPathName := vmPathString convertFromWithConverter: converter. > >> > >> the code becomes: > >> > >> squeakPathName := vmPathString utf8ToSqueak. > > > > For file names, in general, it is ok by now. > > > > The complication is reading the file names in a zip file. The name > > interpretation has to be special. The zip files being created and had > > been created use Shift-JIS for the archive members' names (I wonder it > > is 8859-1 in Western Europe still?). The #defaultSystemConverter > > variant should stay for this purpose. > > Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to > be used for this. From what I can see in a trunk image, the only > reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using > asVmPathName which on a current UTF-8 enabled VM would always use UTF-8 > anyway. Is this currently broken? If that is the case, probably. I'll try to check it later. > >> * Converter classes: If the answer to the previous question is that we > >> use UTF8/Unicode consistently, is there any reason whatsoever to keep > >> the clipboard or keyboard interpreter classes? (we're talking a *lot* of > >> classes here; keyboard interpreter has 15 subclasses; clipboard > >> interpreter 12 etc). > > > > Only reason would be to manage the language tag for some CJK language. > > Could we fold this into the UTF8 converter? I.e., if the environment is > not language-neutral, insert the appropriate language tag? It can be a feature of UTF-8 converter (Phillipe wouldn't like it, probably though). I'd make it explicit so that the user of the converter gets to decide what tag to put. Thank you! -- Yoshiki |
At Sun, 06 Sep 2009 12:52:26 -0700,
Yoshiki Ohshima wrote: > > > Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to > > be used for this. From what I can see in a trunk image, the only > > reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using > > asVmPathName which on a current UTF-8 enabled VM would always use UTF-8 > > anyway. Is this currently broken? > > If that is the case, probably. I'll try to check it later. Yes, ZipFileMember>>readCentralDirectoryFileHeaderFrom: is wrong to send asSqueakPathName for it. Like for fileComment variable, #convertFromSystemString, or its faster variation should be the right thing. -- Yoshiki |
In reply to this post by Yoshiki Ohshima-2
2009/9/6 Yoshiki Ohshima <[hidden email]>:
> At Wed, 02 Sep 2009 21:58:04 -0700, > Andreas Raab wrote: >> >> >> * LanguageEnvironment converters: Is there any reason to assume that we >> >> will ever need to support any encodings other than UTF8/Unicode for the >> >> VM/image interface? Should we just get rid of all of these different >> >> converter methods and use the UTF8/Unicode conversions directly, i.e., >> >> instead of: >> >> >> >> converter := LanguageEnvironment defaultFileNameConverter. >> >> squeakPathName := vmPathString convertFromWithConverter: converter. >> >> >> >> the code becomes: >> >> >> >> squeakPathName := vmPathString utf8ToSqueak. >> > >> > For file names, in general, it is ok by now. >> > >> > The complication is reading the file names in a zip file. The name >> > interpretation has to be special. The zip files being created and had >> > been created use Shift-JIS for the archive members' names (I wonder it >> > is 8859-1 in Western Europe still?). The #defaultSystemConverter >> > variant should stay for this purpose. >> >> Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to >> be used for this. From what I can see in a trunk image, the only >> reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using >> asVmPathName which on a current UTF-8 enabled VM would always use UTF-8 >> anyway. Is this currently broken? > > If that is the case, probably. I'll try to check it later. > >> >> * Converter classes: If the answer to the previous question is that we >> >> use UTF8/Unicode consistently, is there any reason whatsoever to keep >> >> the clipboard or keyboard interpreter classes? (we're talking a *lot* of >> >> classes here; keyboard interpreter has 15 subclasses; clipboard >> >> interpreter 12 etc). >> > >> > Only reason would be to manage the language tag for some CJK language. >> >> Could we fold this into the UTF8 converter? I.e., if the environment is >> not language-neutral, insert the appropriate language tag? > > It can be a feature of UTF-8 converter (Phillipe wouldn't like it, > probably though). Indeed he doesn't. But he uses a modified fast-path from Andreas anyway. Cheers Philippe |
Free forum by Nabble | Edit this page |