Cleaning up Locale, LanguageEnvironment and EncodedCharSet

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Cleaning up Locale, LanguageEnvironment and EncodedCharSet

Guillermo Polito
Hi guys,

I was having a deep look today at Locale, LanguageEnvironment and EncodedCharset with the objective of understanding it and see how we can organize it better.

As I saw, in almost all cases, it happened that the locale and language environment are only used to get the current system's encoding. Even, it is not that the system's encoding is obtained from the system's configuration but guessed. Also, I saw that the usage of leadingCharacter is very limited in the image and I'd say that most of the time we would not be using it: we will be using unicode.

I proposed in the issue tracker a change in three steps that cleans up this:

I know that with this change we lose temporarily the ability to use other language environments (like greek or japanese), and thus, change to other system encodings that are not UTF8/16/32. However, I believe that we should not 'guess' the system converter from the system language but to ask the system the encoding it is using. And this should maybe be added as a primitive (as the others already existing in the Locale class for example).

If somebody could review these issues or has some comment on something that I should not be aware of, I'd be grateful :).

Guille
Reply | Threaded
Open this post in threaded view
|

Re: Cleaning up Locale, LanguageEnvironment and EncodedCharSet

Sven Van Caekenberghe-2

> On 25 Aug 2015, at 16:01, Guillermo Polito <[hidden email]> wrote:
>
> Hi guys,
>
> I was having a deep look today at Locale, LanguageEnvironment and EncodedCharset with the objective of understanding it and see how we can organize it better.
>
> As I saw, in almost all cases, it happened that the locale and language environment are only used to get the current system's encoding. Even, it is not that the system's encoding is obtained from the system's configuration but guessed. Also, I saw that the usage of leadingCharacter is very limited in the image and I'd say that most of the time we would not be using it: we will be using unicode.
>
> I proposed in the issue tracker a change in three steps that cleans up this:
> - Cleaning Locale's API (https://pharo.fogbugz.com/f/cases/16379/Cleanup-Locale-API)
> - Fix users to make use of this new API (https://pharo.fogbugz.com/f/cases/16380/Make-use-of-new-Locale-API)
> - Remove old unused code (https://pharo.fogbugz.com/f/cases/16381/Removed-unused-Locale-code)

Yes, that seems the way to go.

> I know that with this change we lose temporarily the ability to use other language environments (like greek or japanese), and thus, change to other system encodings that are not UTF8/16/32.

Why do you say that ? As far as I understand it, we would not lose anything at all !

Leading char is a hack that does not seem to exist in other programming languages. AFAIU, it is only needed because there are (might/used to be) a couple (a very small number) of Unicode characters shared between 3 languages (I believe Japanese, Korean, and maybe Chinese) where the interpretation of the same Unicode character depends on the language. But I am totally not sure it really is such a big deal, I could be wrong though.

Still, since we do UTF-8 (and some variations, as well as many byte encodings), I am pretty sure we support almost anything out there.

> However, I believe that we should not 'guess' the system converter from the system language but to ask the system the encoding it is using. And this should maybe be added as a primitive (as the others already existing in the Locale class for example).

> If somebody could review these issues or has some comment on something that I should not be aware of, I'd be grateful :).
>
> Guille


Reply | Threaded
Open this post in threaded view
|

Re: Cleaning up Locale, LanguageEnvironment and EncodedCharSet

Guillermo Polito
Well, imagine the following case:

- your computer is configured in a locale like japanese/greek/chinese
       AND
- you have activated the 'System > use Locale' setting. (I was not aware of this setting this morning ;))

Then, the image will fetch the locale from the system and instantiate a language environment according to it. So you could end up using one of the LanguageEnvironment subclasses that could use a leadingChar ~~ 0 friend :)

I do not know how likely is that since that setting is off by default.

El mar., 25 de ago. de 2015 a la(s) 4:29 p. m., Sven Van Caekenberghe <[hidden email]> escribió:

> On 25 Aug 2015, at 16:01, Guillermo Polito <[hidden email]> wrote:
>
> Hi guys,
>
> I was having a deep look today at Locale, LanguageEnvironment and EncodedCharset with the objective of understanding it and see how we can organize it better.
>
> As I saw, in almost all cases, it happened that the locale and language environment are only used to get the current system's encoding. Even, it is not that the system's encoding is obtained from the system's configuration but guessed. Also, I saw that the usage of leadingCharacter is very limited in the image and I'd say that most of the time we would not be using it: we will be using unicode.
>
> I proposed in the issue tracker a change in three steps that cleans up this:
> - Cleaning Locale's API (https://pharo.fogbugz.com/f/cases/16379/Cleanup-Locale-API)
> - Fix users to make use of this new API (https://pharo.fogbugz.com/f/cases/16380/Make-use-of-new-Locale-API)
> - Remove old unused code (https://pharo.fogbugz.com/f/cases/16381/Removed-unused-Locale-code)

Yes, that seems the way to go.

> I know that with this change we lose temporarily the ability to use other language environments (like greek or japanese), and thus, change to other system encodings that are not UTF8/16/32.

Why do you say that ? As far as I understand it, we would not lose anything at all !

Leading char is a hack that does not seem to exist in other programming languages. AFAIU, it is only needed because there are (might/used to be) a couple (a very small number) of Unicode characters shared between 3 languages (I believe Japanese, Korean, and maybe Chinese) where the interpretation of the same Unicode character depends on the language. But I am totally not sure it really is such a big deal, I could be wrong though.

Still, since we do UTF-8 (and some variations, as well as many byte encodings), I am pretty sure we support almost anything out there.

> However, I believe that we should not 'guess' the system converter from the system language but to ask the system the encoding it is using. And this should maybe be added as a primitive (as the others already existing in the Locale class for example).

> If somebody could review these issues or has some comment on something that I should not be aware of, I'd be grateful :).
>
> Guille


Reply | Threaded
Open this post in threaded view
|

Re: Cleaning up Locale, LanguageEnvironment and EncodedCharSet

Sven Van Caekenberghe-2

> On 25 Aug 2015, at 16:36, Guillermo Polito <[hidden email]> wrote:
>
> Well, imagine the following case:
>
> - your computer is configured in a locale like japanese/greek/chinese
>        AND
> - you have activated the 'System > use Locale' setting. (I was not aware of this setting this morning ;))
>
> Then, the image will fetch the locale from the system and instantiate a language environment according to it. So you could end up using one of the LanguageEnvironment subclasses that could use a leadingChar ~~ 0 friend :)

Yes, but would that make any difference ?
I think we just don't know.
I am pretty sure it won't hurt, at all.

> I do not know how likely is that since that setting is off by default.
>
> El mar., 25 de ago. de 2015 a la(s) 4:29 p. m., Sven Van Caekenberghe <[hidden email]> escribió:
>
> > On 25 Aug 2015, at 16:01, Guillermo Polito <[hidden email]> wrote:
> >
> > Hi guys,
> >
> > I was having a deep look today at Locale, LanguageEnvironment and EncodedCharset with the objective of understanding it and see how we can organize it better.
> >
> > As I saw, in almost all cases, it happened that the locale and language environment are only used to get the current system's encoding. Even, it is not that the system's encoding is obtained from the system's configuration but guessed. Also, I saw that the usage of leadingCharacter is very limited in the image and I'd say that most of the time we would not be using it: we will be using unicode.
> >
> > I proposed in the issue tracker a change in three steps that cleans up this:
> > - Cleaning Locale's API (https://pharo.fogbugz.com/f/cases/16379/Cleanup-Locale-API)
> > - Fix users to make use of this new API (https://pharo.fogbugz.com/f/cases/16380/Make-use-of-new-Locale-API)
> > - Remove old unused code (https://pharo.fogbugz.com/f/cases/16381/Removed-unused-Locale-code)
>
> Yes, that seems the way to go.
>
> > I know that with this change we lose temporarily the ability to use other language environments (like greek or japanese), and thus, change to other system encodings that are not UTF8/16/32.
>
> Why do you say that ? As far as I understand it, we would not lose anything at all !
>
> Leading char is a hack that does not seem to exist in other programming languages. AFAIU, it is only needed because there are (might/used to be) a couple (a very small number) of Unicode characters shared between 3 languages (I believe Japanese, Korean, and maybe Chinese) where the interpretation of the same Unicode character depends on the language. But I am totally not sure it really is such a big deal, I could be wrong though.
>
> Still, since we do UTF-8 (and some variations, as well as many byte encodings), I am pretty sure we support almost anything out there.
>
> > However, I believe that we should not 'guess' the system converter from the system language but to ask the system the encoding it is using. And this should maybe be added as a primitive (as the others already existing in the Locale class for example).
>
> > If somebody could review these issues or has some comment on something that I should not be aware of, I'd be grateful :).
> >
> > Guille
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Cleaning up Locale, LanguageEnvironment and EncodedCharSet

Guillermo Polito
Exactly that. The only thing I wanted to say is that we cannot be sure.
But I also think the change is ok for most cases and will keep us in the run. If we discover something missing we can start by adding tests later.

El mar., 25 de ago. de 2015 a la(s) 4:48 p. m., Sven Van Caekenberghe <[hidden email]> escribió:

> On 25 Aug 2015, at 16:36, Guillermo Polito <[hidden email]> wrote:
>
> Well, imagine the following case:
>
> - your computer is configured in a locale like japanese/greek/chinese
>        AND
> - you have activated the 'System > use Locale' setting. (I was not aware of this setting this morning ;))
>
> Then, the image will fetch the locale from the system and instantiate a language environment according to it. So you could end up using one of the LanguageEnvironment subclasses that could use a leadingChar ~~ 0 friend :)

Yes, but would that make any difference ?
I think we just don't know.
I am pretty sure it won't hurt, at all.

> I do not know how likely is that since that setting is off by default.
>
> El mar., 25 de ago. de 2015 a la(s) 4:29 p. m., Sven Van Caekenberghe <[hidden email]> escribió:
>
> > On 25 Aug 2015, at 16:01, Guillermo Polito <[hidden email]> wrote:
> >
> > Hi guys,
> >
> > I was having a deep look today at Locale, LanguageEnvironment and EncodedCharset with the objective of understanding it and see how we can organize it better.
> >
> > As I saw, in almost all cases, it happened that the locale and language environment are only used to get the current system's encoding. Even, it is not that the system's encoding is obtained from the system's configuration but guessed. Also, I saw that the usage of leadingCharacter is very limited in the image and I'd say that most of the time we would not be using it: we will be using unicode.
> >
> > I proposed in the issue tracker a change in three steps that cleans up this:
> > - Cleaning Locale's API (https://pharo.fogbugz.com/f/cases/16379/Cleanup-Locale-API)
> > - Fix users to make use of this new API (https://pharo.fogbugz.com/f/cases/16380/Make-use-of-new-Locale-API)
> > - Remove old unused code (https://pharo.fogbugz.com/f/cases/16381/Removed-unused-Locale-code)
>
> Yes, that seems the way to go.
>
> > I know that with this change we lose temporarily the ability to use other language environments (like greek or japanese), and thus, change to other system encodings that are not UTF8/16/32.
>
> Why do you say that ? As far as I understand it, we would not lose anything at all !
>
> Leading char is a hack that does not seem to exist in other programming languages. AFAIU, it is only needed because there are (might/used to be) a couple (a very small number) of Unicode characters shared between 3 languages (I believe Japanese, Korean, and maybe Chinese) where the interpretation of the same Unicode character depends on the language. But I am totally not sure it really is such a big deal, I could be wrong though.
>
> Still, since we do UTF-8 (and some variations, as well as many byte encodings), I am pretty sure we support almost anything out there.
>
> > However, I believe that we should not 'guess' the system converter from the system language but to ask the system the encoding it is using. And this should maybe be added as a primitive (as the others already existing in the Locale class for example).
>
> > If somebody could review these issues or has some comment on something that I should not be aware of, I'd be grateful :).
> >
> > Guille
>
>