Fwd: about Issue 2118: Let Unicode leadingChar = 0

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: about Issue 2118: Let Unicode leadingChar = 0

Stéphane Ducasse
Hi nicolas

could you give us an executive summary of the reasoning behind this change?
Does this implies that we get Unicode by default?
and that Unicode is integrated in the table of language (replacing latin1),
so this way we do not have to check all the time is it 255?

Stef
I like all these opportunities to learn things.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: about Issue 2118: Let Unicode leadingChar = 0

Nicolas Cellier
Unicode Leading Char
--------------------
The default leading char is now 0 instead of 255 making Unicode - UTF
conversions stable. See also discussion at
http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-August/138915.html


2010/3/8 Stéphane Ducasse <[hidden email]>:

> Hi nicolas
>
> could you give us an executive summary of the reasoning behind this change?
> Does this implies that we get Unicode by default?
> and that Unicode is integrated in the table of language (replacing latin1),
> so this way we do not have to check all the time is it 255?
>
> Stef
> I like all these opportunities to learn things.
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: about Issue 2118: Let Unicode leadingChar = 0

Stéphane Ducasse
Thanks a lot
I think that this is a good decision.

Here is the mail of andreas mentioned by nicolas.

Folks -

I think it's time to do something about the leadingChar in Characters
that has been on the TODO list for a while. I have been looking over
this stuff for some time now, fixing things here and there and laying
some of the ground work for the things to come.

Here is the good news: Squeak doesn't need the leadingChar any longer.
If you are running an updated trunk image you can run entirely without
the leadingChar being used, and I've done this for about a week now with
no ill side effects (disclaimer: I haven't been using very much of m17n
support stuff so there may still be breakage but it means it won't
explode in your face straightaway). If you would like to try yourself,
all you need to do is to hack Character>>setValue: to say, e.g.,

        value := newValue bitClear: 16r3FC00000.

and you're good (and won't ever see a leadingChar). However, the removal
of the leading char could be used to do a couple of other things that I
would like to discuss and solicit feedback in particular from the folks
who care about the leadingChar.

The main insight is that although we *can* run without the leadingChar,
it doesn't mean we *have* to. As it stands, the leading char is used for
two purposes: Character set selection (EncodedCharSet) and (parts of)
language support. There is a significant amount of confusion between the
two with Latin1/Latin2Environment subclasses of LanguageEnvironment
(although these are character encodings not languagse).

What I would propose to do here is to define that "leadingChar = 0"
which currently means "Latin1 encoding, language neutral" is being
redefined to "Unicode encoding, language neutral". What this does is
that "Character value: 353" and "Unicode value: 353" become the same, if
the environment is considered language neutral which by default it would be.

All but the environment which care about the connotations of the
language tag should be able to work with this definition without any
change whatsovever. The only thing that changes is that the default
LanguageEnvironment is Unicode based, using leadingChar=0, most of the
subclasses go away (being replaced by the default LanguageEnvironment)
and those that we care about, or need a transition plan (i.e., the CJK
languages) we keep using the language tag for the time being.

That means that *if* you set your language environment to be one of the
CJK languages you get a language tag in your strings, but by default the
language neutral environment will produce "plain Unicode". Which should
make the server/seaside/aida people a lot more happy when dealing with
this stuff.

For the CJK languages (or other languages requiring support that has
been so far expressed via the languag tag) we can use this opportunity
and phase the use of the language tag out in favor of using text
attributes (which would have to be written first).

The main advantage of the proposal is that the people who would like to
use plain Unicode get to use it, and the people who care about the
language tag and its consequences can still use that as well.

How does that sound?

Cheers,
   - Andreas

On Mar 8, 2010, at 10:49 PM, Nicolas Cellier wrote:

> Unicode Leading Char
> --------------------
> The default leading char is now 0 instead of 255 making Unicode - UTF
> conversions stable. See also discussion at
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-August/138915.html
>
>
> 2010/3/8 Stéphane Ducasse <[hidden email]>:
>> Hi nicolas
>>
>> could you give us an executive summary of the reasoning behind this change?
>> Does this implies that we get Unicode by default?
>> and that Unicode is integrated in the table of language (replacing latin1),
>> so this way we do not have to check all the time is it 255?
>>
>> Stef
>> I like all these opportunities to learn things.
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project