Hi nicolas
could you give us an executive summary of the reasoning behind this change? Does this implies that we get Unicode by default? and that Unicode is integrated in the table of language (replacing latin1), so this way we do not have to check all the time is it 255? Stef I like all these opportunities to learn things. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Unicode Leading Char
-------------------- The default leading char is now 0 instead of 255 making Unicode - UTF conversions stable. See also discussion at http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-August/138915.html 2010/3/8 Stéphane Ducasse <[hidden email]>: > Hi nicolas > > could you give us an executive summary of the reasoning behind this change? > Does this implies that we get Unicode by default? > and that Unicode is integrated in the table of language (replacing latin1), > so this way we do not have to check all the time is it 255? > > Stef > I like all these opportunities to learn things. > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Thanks a lot
I think that this is a good decision. Here is the mail of andreas mentioned by nicolas. Folks - I think it's time to do something about the leadingChar in Characters that has been on the TODO list for a while. I have been looking over this stuff for some time now, fixing things here and there and laying some of the ground work for the things to come. Here is the good news: Squeak doesn't need the leadingChar any longer. If you are running an updated trunk image you can run entirely without the leadingChar being used, and I've done this for about a week now with no ill side effects (disclaimer: I haven't been using very much of m17n support stuff so there may still be breakage but it means it won't explode in your face straightaway). If you would like to try yourself, all you need to do is to hack Character>>setValue: to say, e.g., value := newValue bitClear: 16r3FC00000. and you're good (and won't ever see a leadingChar). However, the removal of the leading char could be used to do a couple of other things that I would like to discuss and solicit feedback in particular from the folks who care about the leadingChar. The main insight is that although we *can* run without the leadingChar, it doesn't mean we *have* to. As it stands, the leading char is used for two purposes: Character set selection (EncodedCharSet) and (parts of) language support. There is a significant amount of confusion between the two with Latin1/Latin2Environment subclasses of LanguageEnvironment (although these are character encodings not languagse). What I would propose to do here is to define that "leadingChar = 0" which currently means "Latin1 encoding, language neutral" is being redefined to "Unicode encoding, language neutral". What this does is that "Character value: 353" and "Unicode value: 353" become the same, if the environment is considered language neutral which by default it would be. All but the environment which care about the connotations of the language tag should be able to work with this definition without any change whatsovever. The only thing that changes is that the default LanguageEnvironment is Unicode based, using leadingChar=0, most of the subclasses go away (being replaced by the default LanguageEnvironment) and those that we care about, or need a transition plan (i.e., the CJK languages) we keep using the language tag for the time being. That means that *if* you set your language environment to be one of the CJK languages you get a language tag in your strings, but by default the language neutral environment will produce "plain Unicode". Which should make the server/seaside/aida people a lot more happy when dealing with this stuff. For the CJK languages (or other languages requiring support that has been so far expressed via the languag tag) we can use this opportunity and phase the use of the language tag out in favor of using text attributes (which would have to be written first). The main advantage of the proposal is that the people who would like to use plain Unicode get to use it, and the people who care about the language tag and its consequences can still use that as well. How does that sound? Cheers, - Andreas On Mar 8, 2010, at 10:49 PM, Nicolas Cellier wrote: > Unicode Leading Char > -------------------- > The default leading char is now 0 instead of 255 making Unicode - UTF > conversions stable. See also discussion at > http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-August/138915.html > > > 2010/3/8 Stéphane Ducasse <[hidden email]>: >> Hi nicolas >> >> could you give us an executive summary of the reasoning behind this change? >> Does this implies that we get Unicode by default? >> and that Unicode is integrated in the table of language (replacing latin1), >> so this way we do not have to check all the time is it 255? >> >> Stef >> I like all these opportunities to learn things. >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |