About adding a Unicode handling porting layer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

About adding a Unicode handling porting layer

Hannes Hirzel
Hello Germán and Juan

As we have seen we can say that Cuis handles Unicode to a certain
limited extent.

I will post summary a writeup of what I know about it later. I am
interested in working/contributing to an add-on which loads Unicode
support into Cuis.

For general work I need

a)
an add-on so that Cuis can process arbitrary UFT8 text files. However
the majority of the content characters will fall into the
  https://de.wikipedia.org/wiki/ISO_8859-15
range. So it is fine if the other characters are rendered as \unnn or &#nnn;

b)
Another more rewarding put maybe more difficult way  would be to
replace the String class with a class which handles 16bit characters
instead of 8 bit characters. In terms of structure all would remain
the same. Characters would be 16bit like in Java.


This will come later. At the moment I am working on ContentPack
version 2 which will run on Cuis, Squeak and Pharo.

Kind regards

--Hannes

> 2013/1/22 Germán Arduino <[hidden email]>:
>> Thanks for the comments Hannes / Juan:
>>
>> I will look into it when have time, or if you prefer Hannes and want
>> to help I will integrate it when finish with Aida.
>>
>> Germán.
>>
>>
>>
>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>> Hi Germán,
>>>
>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for the
>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>> alphabets).
>>>
>>> Cheers,
>>> Juan Vuletich
>>>
>>> Germán Arduino wrote:
>>>>
>>>> Hi:
>>>>
>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all
>>>> tests green are ready to install.
>>>>
>>>> The changes I did in Swazoo are:
>>>>
>>>>
>>>> - Avoid Unicode support that don't exist in Cuis
>>>>
......

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

garduino
Nice if you will develop the needed code!

The first need I have is on the methods of Swazoo that I commented in
other mail, but I think that is more simple, only that I don't was
aware of the already inplace support in Cuis itself.

Germàn.

2013/1/22 H. Hirzel <[hidden email]>:

> Hello Germán and Juan
>
> As we have seen we can say that Cuis handles Unicode to a certain
> limited extent.
>
> I will post summary a writeup of what I know about it later. I am
> interested in working/contributing to an add-on which loads Unicode
> support into Cuis.
>
> For general work I need
>
> a)
> an add-on so that Cuis can process arbitrary UFT8 text files. However
> the majority of the content characters will fall into the
>   https://de.wikipedia.org/wiki/ISO_8859-15
> range. So it is fine if the other characters are rendered as \unnn or &#nnn;
>
> b)
> Another more rewarding put maybe more difficult way  would be to
> replace the String class with a class which handles 16bit characters
> instead of 8 bit characters. In terms of structure all would remain
> the same. Characters would be 16bit like in Java.
>
>
> This will come later. At the moment I am working on ContentPack
> version 2 which will run on Cuis, Squeak and Pharo.
>
> Kind regards
>
> --Hannes
>
>> 2013/1/22 Germán Arduino <[hidden email]>:
>>> Thanks for the comments Hannes / Juan:
>>>
>>> I will look into it when have time, or if you prefer Hannes and want
>>> to help I will integrate it when finish with Aida.
>>>
>>> Germán.
>>>
>>>
>>>
>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>> Hi Germán,
>>>>
>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for the
>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>> alphabets).
>>>>
>>>> Cheers,
>>>> Juan Vuletich
>>>>
>>>> Germán Arduino wrote:
>>>>>
>>>>> Hi:
>>>>>
>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all
>>>>> tests green are ready to install.
>>>>>
>>>>> The changes I did in Swazoo are:
>>>>>
>>>>>
>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>
> ......
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org



--
Sincerely,
Germán Arduino
about.me/garduino

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
Hello Germán

On 1/22/13, Germán Arduino <[hidden email]> wrote:
> Nice if you will develop the needed code!
>
> The first need I have is on the methods of Swazoo that I commented in
> other mail, but I think that is more simple, only that I don't was
> aware of the already inplace support in Cuis itself.

Yes, that took me as well some time to find out that Cuis indeed has
some limited Unicode support.

Juan originally wrote that Cuis had dropped Unicode support.

When I have a look at Cuis from outside I cannot say that it is the
case as Cuis consumes and writes UFT8 text files. Unicode text
snippets pasted through the clipboard into a Cuis TextEditor also pass
in well. The only limitation is that internally it only handles the
code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
And if I work in a Cuis workspace  with

    nn asCharacter

where nn is an Integer

   nn must belong to ISO_8859-15


ISO_8859-15 is good for most European languages. If we would have an
Add-On to cater for occasional other characters of Unicode which do
not fall into the set covered by ISO_8859-15 that would make UTF8 text
file processing with Cuis safe.


--Hannes


>
> Germàn.
>
> 2013/1/22 H. Hirzel <[hidden email]>:
>> Hello Germán and Juan
>>
>> As we have seen we can say that Cuis handles Unicode to a certain
>> limited extent.
>>
>> I will post summary a writeup of what I know about it later. I am
>> interested in working/contributing to an add-on which loads Unicode
>> support into Cuis.
>>
>> For general work I need
>>
>> a)
>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>> the majority of the content characters will fall into the
>>   https://de.wikipedia.org/wiki/ISO_8859-15
>> range. So it is fine if the other characters are rendered as \unnn or
>> &#nnn;
>>
>> b)
>> Another more rewarding put maybe more difficult way  would be to
>> replace the String class with a class which handles 16bit characters
>> instead of 8 bit characters. In terms of structure all would remain
>> the same. Characters would be 16bit like in Java.
>>
>>
>> This will come later. At the moment I am working on ContentPack
>> version 2 which will run on Cuis, Squeak and Pharo.
>>
>> Kind regards
>>
>> --Hannes
>>
>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>> Thanks for the comments Hannes / Juan:
>>>>
>>>> I will look into it when have time, or if you prefer Hannes and want
>>>> to help I will integrate it when finish with Aida.
>>>>
>>>> Germán.
>>>>
>>>>
>>>>
>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>>> Hi Germán,
>>>>>
>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for
>>>>> the
>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>> alphabets).
>>>>>
>>>>> Cheers,
>>>>> Juan Vuletich
>>>>>
>>>>> Germán Arduino wrote:
>>>>>>
>>>>>> Hi:
>>>>>>
>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all
>>>>>> tests green are ready to install.
>>>>>>
>>>>>> The changes I did in Swazoo are:
>>>>>>
>>>>>>
>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>
>> ......
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>
>
>
> --
> Sincerely,
> Germán Arduino
> about.me/garduino
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
The attached change set prevents Cuis from silently ignoring
characters which are not in ISO 8859-15.

For example if you paste a text snippet which contains the letter
Omega (Ω) into a TextWindow it is displayed as &#937;

The part which does it the other way round is not included.

--Hannes



On 1/22/13, H. Hirzel <[hidden email]> wrote:

> Hello Germán
>
> On 1/22/13, Germán Arduino <[hidden email]> wrote:
>> Nice if you will develop the needed code!
>>
>> The first need I have is on the methods of Swazoo that I commented in
>> other mail, but I think that is more simple, only that I don't was
>> aware of the already inplace support in Cuis itself.
>
> Yes, that took me as well some time to find out that Cuis indeed has
> some limited Unicode support.
>
> Juan originally wrote that Cuis had dropped Unicode support.
>
> When I have a look at Cuis from outside I cannot say that it is the
> case as Cuis consumes and writes UFT8 text files. Unicode text
> snippets pasted through the clipboard into a Cuis TextEditor also pass
> in well. The only limitation is that internally it only handles the
> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
> And if I work in a Cuis workspace  with
>
>     nn asCharacter
>
> where nn is an Integer
>
>    nn must belong to ISO_8859-15
>
>
> ISO_8859-15 is good for most European languages. If we would have an
> Add-On to cater for occasional other characters of Unicode which do
> not fall into the set covered by ISO_8859-15 that would make UTF8 text
> file processing with Cuis safe.
>
>
> --Hannes
>
>
>>
>> Germàn.
>>
>> 2013/1/22 H. Hirzel <[hidden email]>:
>>> Hello Germán and Juan
>>>
>>> As we have seen we can say that Cuis handles Unicode to a certain
>>> limited extent.
>>>
>>> I will post summary a writeup of what I know about it later. I am
>>> interested in working/contributing to an add-on which loads Unicode
>>> support into Cuis.
>>>
>>> For general work I need
>>>
>>> a)
>>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>>> the majority of the content characters will fall into the
>>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>> range. So it is fine if the other characters are rendered as \unnn or
>>> &#nnn;
>>>
>>> b)
>>> Another more rewarding put maybe more difficult way  would be to
>>> replace the String class with a class which handles 16bit characters
>>> instead of 8 bit characters. In terms of structure all would remain
>>> the same. Characters would be 16bit like in Java.
>>>
>>>
>>> This will come later. At the moment I am working on ContentPack
>>> version 2 which will run on Cuis, Squeak and Pharo.
>>>
>>> Kind regards
>>>
>>> --Hannes
>>>
>>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>>> Thanks for the comments Hannes / Juan:
>>>>>
>>>>> I will look into it when have time, or if you prefer Hannes and want
>>>>> to help I will integrate it when finish with Aida.
>>>>>
>>>>> Germán.
>>>>>
>>>>>
>>>>>
>>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>>>> Hi Germán,
>>>>>>
>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for
>>>>>> the
>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>>> alphabets).
>>>>>>
>>>>>> Cheers,
>>>>>> Juan Vuletich
>>>>>>
>>>>>> Germán Arduino wrote:
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all
>>>>>>> tests green are ready to install.
>>>>>>>
>>>>>>> The changes I did in Swazoo are:
>>>>>>>
>>>>>>>
>>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>>
>>> ......
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> [hidden email]
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>>
>>
>> --
>> Sincerely,
>> Germán Arduino
>> about.me/garduino
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org

1573-CuisCore-HannesHirzel-2013Jan22-15h09m-hjh.1.cs.st (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Juan Vuletich-4
Thanks Hannes, just integrated this.

Cheers,
Juan Vuletich

H. Hirzel wrote:

> The attached change set prevents Cuis from silently ignoring
> characters which are not in ISO 8859-15.
>
> For example if you paste a text snippet which contains the letter
> Omega (Ω) into a TextWindow it is displayed as &#937;
>
> The part which does it the other way round is not included.
>
> --Hannes
>
>
>
> On 1/22/13, H. Hirzel <[hidden email]> wrote:
>  
>> Hello Germán
>>
>> On 1/22/13, Germán Arduino <[hidden email]> wrote:
>>    
>>> Nice if you will develop the needed code!
>>>
>>> The first need I have is on the methods of Swazoo that I commented in
>>> other mail, but I think that is more simple, only that I don't was
>>> aware of the already inplace support in Cuis itself.
>>>      
>> Yes, that took me as well some time to find out that Cuis indeed has
>> some limited Unicode support.
>>
>> Juan originally wrote that Cuis had dropped Unicode support.
>>
>> When I have a look at Cuis from outside I cannot say that it is the
>> case as Cuis consumes and writes UFT8 text files. Unicode text
>> snippets pasted through the clipboard into a Cuis TextEditor also pass
>> in well. The only limitation is that internally it only handles the
>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>> And if I work in a Cuis workspace  with
>>
>>     nn asCharacter
>>
>> where nn is an Integer
>>
>>    nn must belong to ISO_8859-15
>>
>>
>> ISO_8859-15 is good for most European languages. If we would have an
>> Add-On to cater for occasional other characters of Unicode which do
>> not fall into the set covered by ISO_8859-15 that would make UTF8 text
>> file processing with Cuis safe.
>>
>>
>> --Hannes
>>
>>
>>    
>>> Germàn.
>>>
>>> 2013/1/22 H. Hirzel <[hidden email]>:
>>>      
>>>> Hello Germán and Juan
>>>>
>>>> As we have seen we can say that Cuis handles Unicode to a certain
>>>> limited extent.
>>>>
>>>> I will post summary a writeup of what I know about it later. I am
>>>> interested in working/contributing to an add-on which loads Unicode
>>>> support into Cuis.
>>>>
>>>> For general work I need
>>>>
>>>> a)
>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>>>> the majority of the content characters will fall into the
>>>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>>> range. So it is fine if the other characters are rendered as \unnn or
>>>> &#nnn;
>>>>
>>>> b)
>>>> Another more rewarding put maybe more difficult way  would be to
>>>> replace the String class with a class which handles 16bit characters
>>>> instead of 8 bit characters. In terms of structure all would remain
>>>> the same. Characters would be 16bit like in Java.
>>>>
>>>>
>>>> This will come later. At the moment I am working on ContentPack
>>>> version 2 which will run on Cuis, Squeak and Pharo.
>>>>
>>>> Kind regards
>>>>
>>>> --Hannes
>>>>
>>>>        
>>>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>>>          
>>>>>> Thanks for the comments Hannes / Juan:
>>>>>>
>>>>>> I will look into it when have time, or if you prefer Hannes and want
>>>>>> to help I will integrate it when finish with Aida.
>>>>>>
>>>>>> Germán.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>>>>            
>>>>>>> Hi Germán,
>>>>>>>
>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for
>>>>>>> the
>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>>>> alphabets).
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Juan Vuletich
>>>>>>>
>>>>>>> Germán Arduino wrote:
>>>>>>>              
>>>>>>>> Hi:
>>>>>>>>
>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all
>>>>>>>> tests green are ready to install.
>>>>>>>>
>>>>>>>> The changes I did in Swazoo are:
>>>>>>>>
>>>>>>>>
>>>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>>>
>>>>>>>>                
>>>> ......
>>>>
>>>> _______________________________________________
>>>> Cuis mailing list
>>>> [hidden email]
>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>        
>>>
>>> --
>>> Sincerely,
>>> Germán Arduino
>>> about.me/garduino
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> [hidden email]
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>
>>>      
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>    


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
Thank you Juan,
for adding the Unicode fix so that pasting text through the clipboard
does not silently loose characters. More things like this (including
comments) later.

I have realized that what I wrote earlier is wrong. Cuis reads and
saves files in ISO8859-15 by default and not with Unicode. However it
is not too difficult to read and write a Unicode file.

I have started some notes on this here
https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md

Regards
Hannes

On 1/23/13, Juan Vuletich <[hidden email]> wrote:

> Thanks Hannes, just integrated this.
>
> Cheers,
> Juan Vuletich
>
> H. Hirzel wrote:
>> The attached change set prevents Cuis from silently ignoring
>> characters which are not in ISO 8859-15.
>>
>> For example if you paste a text snippet which contains the letter
>> Omega (Ω) into a TextWindow it is displayed as &#937;
>>
>> The part which does it the other way round is not included.
>>
>> --Hannes
>>
>>
>>
>> On 1/22/13, H. Hirzel <[hidden email]> wrote:
>>
>>> Hello Germán
>>>
>>> On 1/22/13, Germán Arduino <[hidden email]> wrote:
>>>
>>>> Nice if you will develop the needed code!
>>>>
>>>> The first need I have is on the methods of Swazoo that I commented in
>>>> other mail, but I think that is more simple, only that I don't was
>>>> aware of the already inplace support in Cuis itself.
>>>>
>>> Yes, that took me as well some time to find out that Cuis indeed has
>>> some limited Unicode support.
>>>
>>> Juan originally wrote that Cuis had dropped Unicode support.
>>>
>>> When I have a look at Cuis from outside I cannot say that it is the
>>> case as Cuis consumes and writes UFT8 text files. Unicode text
>>> snippets pasted through the clipboard into a Cuis TextEditor also pass
>>> in well. The only limitation is that internally it only handles the
>>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>> And if I work in a Cuis workspace  with
>>>
>>>     nn asCharacter
>>>
>>> where nn is an Integer
>>>
>>>    nn must belong to ISO_8859-15
>>>
>>>
>>> ISO_8859-15 is good for most European languages. If we would have an
>>> Add-On to cater for occasional other characters of Unicode which do
>>> not fall into the set covered by ISO_8859-15 that would make UTF8 text
>>> file processing with Cuis safe.
>>>
>>>
>>> --Hannes
>>>
>>>
>>>
>>>> Germàn.
>>>>
>>>> 2013/1/22 H. Hirzel <[hidden email]>:
>>>>
>>>>> Hello Germán and Juan
>>>>>
>>>>> As we have seen we can say that Cuis handles Unicode to a certain
>>>>> limited extent.
>>>>>
>>>>> I will post summary a writeup of what I know about it later. I am
>>>>> interested in working/contributing to an add-on which loads Unicode
>>>>> support into Cuis.
>>>>>
>>>>> For general work I need
>>>>>
>>>>> a)
>>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>>>>> the majority of the content characters will fall into the
>>>>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>>>> range. So it is fine if the other characters are rendered as \unnn or
>>>>> &#nnn;
>>>>>
>>>>> b)
>>>>> Another more rewarding put maybe more difficult way  would be to
>>>>> replace the String class with a class which handles 16bit characters
>>>>> instead of 8 bit characters. In terms of structure all would remain
>>>>> the same. Characters would be 16bit like in Java.
>>>>>
>>>>>
>>>>> This will come later. At the moment I am working on ContentPack
>>>>> version 2 which will run on Cuis, Squeak and Pharo.
>>>>>
>>>>> Kind regards
>>>>>
>>>>> --Hannes
>>>>>
>>>>>
>>>>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>>>>
>>>>>>> Thanks for the comments Hannes / Juan:
>>>>>>>
>>>>>>> I will look into it when have time, or if you prefer Hannes and want
>>>>>>> to help I will integrate it when finish with Aida.
>>>>>>>
>>>>>>> Germán.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>>>>>
>>>>>>>> Hi Germán,
>>>>>>>>
>>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for
>>>>>>>> the
>>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>>>>> alphabets).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Juan Vuletich
>>>>>>>>
>>>>>>>> Germán Arduino wrote:
>>>>>>>>
>>>>>>>>> Hi:
>>>>>>>>>
>>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all
>>>>>>>>> tests green are ready to install.
>>>>>>>>>
>>>>>>>>> The changes I did in Swazoo are:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>>>>
>>>>>>>>>
>>>>> ......
>>>>>
>>>>> _______________________________________________
>>>>> Cuis mailing list
>>>>> [hidden email]
>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>
>>>>
>>>> --
>>>> Sincerely,
>>>> Germán Arduino
>>>> about.me/garduino
>>>>
>>>> _______________________________________________
>>>> Cuis mailing list
>>>> [hidden email]
>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>
>>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> [hidden email]
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>
>
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

garduino
Thanks Hannes, this is very useful to me.

My next step in porting stuff is polish WebClient and, between other
things, Unicode is an issue.

Germán.

2013/2/1 H. Hirzel <[hidden email]>:

> Thank you Juan,
> for adding the Unicode fix so that pasting text through the clipboard
> does not silently loose characters. More things like this (including
> comments) later.
>
> I have realized that what I wrote earlier is wrong. Cuis reads and
> saves files in ISO8859-15 by default and not with Unicode. However it
> is not too difficult to read and write a Unicode file.
>
> I have started some notes on this here
> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md
>
> Regards
> Hannes
>
> On 1/23/13, Juan Vuletich <[hidden email]> wrote:
>> Thanks Hannes, just integrated this.
>>
>> Cheers,
>> Juan Vuletich
>>
>> H. Hirzel wrote:
>>> The attached change set prevents Cuis from silently ignoring
>>> characters which are not in ISO 8859-15.
>>>
>>> For example if you paste a text snippet which contains the letter
>>> Omega (Ω) into a TextWindow it is displayed as &#937;
>>>
>>> The part which does it the other way round is not included.
>>>
>>> --Hannes
>>>
>>>
>>>
>>> On 1/22/13, H. Hirzel <[hidden email]> wrote:
>>>
>>>> Hello Germán
>>>>
>>>> On 1/22/13, Germán Arduino <[hidden email]> wrote:
>>>>
>>>>> Nice if you will develop the needed code!
>>>>>
>>>>> The first need I have is on the methods of Swazoo that I commented in
>>>>> other mail, but I think that is more simple, only that I don't was
>>>>> aware of the already inplace support in Cuis itself.
>>>>>
>>>> Yes, that took me as well some time to find out that Cuis indeed has
>>>> some limited Unicode support.
>>>>
>>>> Juan originally wrote that Cuis had dropped Unicode support.
>>>>
>>>> When I have a look at Cuis from outside I cannot say that it is the
>>>> case as Cuis consumes and writes UFT8 text files. Unicode text
>>>> snippets pasted through the clipboard into a Cuis TextEditor also pass
>>>> in well. The only limitation is that internally it only handles the
>>>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>>> And if I work in a Cuis workspace  with
>>>>
>>>>     nn asCharacter
>>>>
>>>> where nn is an Integer
>>>>
>>>>    nn must belong to ISO_8859-15
>>>>
>>>>
>>>> ISO_8859-15 is good for most European languages. If we would have an
>>>> Add-On to cater for occasional other characters of Unicode which do
>>>> not fall into the set covered by ISO_8859-15 that would make UTF8 text
>>>> file processing with Cuis safe.
>>>>
>>>>
>>>> --Hannes
>>>>
>>>>
>>>>
>>>>> Germàn.
>>>>>
>>>>> 2013/1/22 H. Hirzel <[hidden email]>:
>>>>>
>>>>>> Hello Germán and Juan
>>>>>>
>>>>>> As we have seen we can say that Cuis handles Unicode to a certain
>>>>>> limited extent.
>>>>>>
>>>>>> I will post summary a writeup of what I know about it later. I am
>>>>>> interested in working/contributing to an add-on which loads Unicode
>>>>>> support into Cuis.
>>>>>>
>>>>>> For general work I need
>>>>>>
>>>>>> a)
>>>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>>>>>> the majority of the content characters will fall into the
>>>>>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>>>>> range. So it is fine if the other characters are rendered as \unnn or
>>>>>> &#nnn;
>>>>>>
>>>>>> b)
>>>>>> Another more rewarding put maybe more difficult way  would be to
>>>>>> replace the String class with a class which handles 16bit characters
>>>>>> instead of 8 bit characters. In terms of structure all would remain
>>>>>> the same. Characters would be 16bit like in Java.
>>>>>>
>>>>>>
>>>>>> This will come later. At the moment I am working on ContentPack
>>>>>> version 2 which will run on Cuis, Squeak and Pharo.
>>>>>>
>>>>>> Kind regards
>>>>>>
>>>>>> --Hannes
>>>>>>
>>>>>>
>>>>>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>>>>>
>>>>>>>> Thanks for the comments Hannes / Juan:
>>>>>>>>
>>>>>>>> I will look into it when have time, or if you prefer Hannes and want
>>>>>>>> to help I will integrate it when finish with Aida.
>>>>>>>>
>>>>>>>> Germán.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>>>>>>
>>>>>>>>> Hi Germán,
>>>>>>>>>
>>>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for
>>>>>>>>> the
>>>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>>>>>> alphabets).
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Juan Vuletich
>>>>>>>>>
>>>>>>>>> Germán Arduino wrote:
>>>>>>>>>
>>>>>>>>>> Hi:
>>>>>>>>>>
>>>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all
>>>>>>>>>> tests green are ready to install.
>>>>>>>>>>
>>>>>>>>>> The changes I did in Swazoo are:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>>>>>
>>>>>>>>>>
>>>>>> ......
>>>>>>
>>>>>> _______________________________________________
>>>>>> Cuis mailing list
>>>>>> [hidden email]
>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>
>>>>>
>>>>> --
>>>>> Sincerely,
>>>>> Germán Arduino
>>>>> about.me/garduino
>>>>>
>>>>> _______________________________________________
>>>>> Cuis mailing list
>>>>> [hidden email]
>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>
>>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Cuis mailing list
>>>> [hidden email]
>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>
>>
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
Your feedback, Germán,

makes me consider to analyze what it would involve to write a simple
Unicode porting level.

I think of  an Add-On to Cuis which people can load if they want to
work more Unicode compliant.

In fact 50% of all HTML files on the internet are encoded in UTF8 and
the text files with which I work are mostly UFT8. So if I use
WebClient to download an HTML file and want to futher process it I
have to deal with workarounds. Even HTML files in major European
languages often have Unicode characters like special hypens, quotation
marks, graphical symbols etc.

One idea I'd like to try out is just to replace the class String which
only stores bytes (8bit) with a String class which stores words (32
bit). It is a bit a waste in terms of space but conceptually it would
be straightforward. Space measurement has shown that there are not all
that many strings in Cuis. The major part is taken by bitmaps.

I just have to figure out how to work with these
variableByteSubclasses with which I have not done much in the past.

--Hannes





On 2/1/13, Germán Arduino <[hidden email]> wrote:

> Thanks Hannes, this is very useful to me.
>
> My next step in porting stuff is polish WebClient and, between other
> things, Unicode is an issue.
>
> Germán.
>
> 2013/2/1 H. Hirzel <[hidden email]>:
>> Thank you Juan,
>> for adding the Unicode fix so that pasting text through the clipboard
>> does not silently loose characters. More things like this (including
>> comments) later.
>>
>> I have realized that what I wrote earlier is wrong. Cuis reads and
>> saves files in ISO8859-15 by default and not with Unicode. However it
>> is not too difficult to read and write a Unicode file.
>>
>> I have started some notes on this here
>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md
>>
>> Regards
>> Hannes
>>
>> On 1/23/13, Juan Vuletich <[hidden email]> wrote:
>>> Thanks Hannes, just integrated this.
>>>
>>> Cheers,
>>> Juan Vuletich
>>>
>>> H. Hirzel wrote:
>>>> The attached change set prevents Cuis from silently ignoring
>>>> characters which are not in ISO 8859-15.
>>>>
>>>> For example if you paste a text snippet which contains the letter
>>>> Omega (Ω) into a TextWindow it is displayed as &#937;
>>>>
>>>> The part which does it the other way round is not included.
>>>>
>>>> --Hannes
>>>>
>>>>
>>>>
>>>> On 1/22/13, H. Hirzel <[hidden email]> wrote:
>>>>
>>>>> Hello Germán
>>>>>
>>>>> On 1/22/13, Germán Arduino <[hidden email]> wrote:
>>>>>
>>>>>> Nice if you will develop the needed code!
>>>>>>
>>>>>> The first need I have is on the methods of Swazoo that I commented in
>>>>>> other mail, but I think that is more simple, only that I don't was
>>>>>> aware of the already inplace support in Cuis itself.
>>>>>>
>>>>> Yes, that took me as well some time to find out that Cuis indeed has
>>>>> some limited Unicode support.
>>>>>
>>>>> Juan originally wrote that Cuis had dropped Unicode support.
>>>>>
>>>>> When I have a look at Cuis from outside I cannot say that it is the
>>>>> case as Cuis consumes and writes UFT8 text files. Unicode text
>>>>> snippets pasted through the clipboard into a Cuis TextEditor also pass
>>>>> in well. The only limitation is that internally it only handles the
>>>>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>>>> And if I work in a Cuis workspace  with
>>>>>
>>>>>     nn asCharacter
>>>>>
>>>>> where nn is an Integer
>>>>>
>>>>>    nn must belong to ISO_8859-15
>>>>>
>>>>>
>>>>> ISO_8859-15 is good for most European languages. If we would have an
>>>>> Add-On to cater for occasional other characters of Unicode which do
>>>>> not fall into the set covered by ISO_8859-15 that would make UTF8 text
>>>>> file processing with Cuis safe.
>>>>>
>>>>>
>>>>> --Hannes
>>>>>
>>>>>
>>>>>
>>>>>> Germàn.
>>>>>>
>>>>>> 2013/1/22 H. Hirzel <[hidden email]>:
>>>>>>
>>>>>>> Hello Germán and Juan
>>>>>>>
>>>>>>> As we have seen we can say that Cuis handles Unicode to a certain
>>>>>>> limited extent.
>>>>>>>
>>>>>>> I will post summary a writeup of what I know about it later. I am
>>>>>>> interested in working/contributing to an add-on which loads Unicode
>>>>>>> support into Cuis.
>>>>>>>
>>>>>>> For general work I need
>>>>>>>
>>>>>>> a)
>>>>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>>>>>>> the majority of the content characters will fall into the
>>>>>>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>>>>>> range. So it is fine if the other characters are rendered as \unnn or
>>>>>>> &#nnn;
>>>>>>>
>>>>>>> b)
>>>>>>> Another more rewarding put maybe more difficult way  would be to
>>>>>>> replace the String class with a class which handles 16bit characters
>>>>>>> instead of 8 bit characters. In terms of structure all would remain
>>>>>>> the same. Characters would be 16bit like in Java.
>>>>>>>
>>>>>>>
>>>>>>> This will come later. At the moment I am working on ContentPack
>>>>>>> version 2 which will run on Cuis, Squeak and Pharo.
>>>>>>>
>>>>>>> Kind regards
>>>>>>>
>>>>>>> --Hannes
>>>>>>>
>>>>>>>
>>>>>>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>>>>>>
>>>>>>>>> Thanks for the comments Hannes / Juan:
>>>>>>>>>
>>>>>>>>> I will look into it when have time, or if you prefer Hannes and
>>>>>>>>> want
>>>>>>>>> to help I will integrate it when finish with Aida.
>>>>>>>>>
>>>>>>>>> Germán.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>>>>>>>
>>>>>>>>>> Hi Germán,
>>>>>>>>>>
>>>>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>>>>>>>>>> for
>>>>>>>>>> the
>>>>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>>>>>>> alphabets).
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Juan Vuletich
>>>>>>>>>>
>>>>>>>>>> Germán Arduino wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi:
>>>>>>>>>>>
>>>>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>>>>>>>>>>> all
>>>>>>>>>>> tests green are ready to install.
>>>>>>>>>>>
>>>>>>>>>>> The changes I did in Swazoo are:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>> ......
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Cuis mailing list
>>>>>>> [hidden email]
>>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sincerely,
>>>>>> Germán Arduino
>>>>>> about.me/garduino
>>>>>>
>>>>>> _______________________________________________
>>>>>> Cuis mailing list
>>>>>> [hidden email]
>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>
>>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Cuis mailing list
>>>>> [hidden email]
>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>
>>>
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> [hidden email]
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

garduino
Yes, I agree in that have an installable layer should be the best option.

But Unicode and related stuff are not my expertise field and I do not
have too much value to add :(


2013/2/2 H. Hirzel <[hidden email]>:

> Your feedback, Germán,
>
> makes me consider to analyze what it would involve to write a simple
> Unicode porting level.
>
> I think of  an Add-On to Cuis which people can load if they want to
> work more Unicode compliant.
>
> In fact 50% of all HTML files on the internet are encoded in UTF8 and
> the text files with which I work are mostly UFT8. So if I use
> WebClient to download an HTML file and want to futher process it I
> have to deal with workarounds. Even HTML files in major European
> languages often have Unicode characters like special hypens, quotation
> marks, graphical symbols etc.
>
> One idea I'd like to try out is just to replace the class String which
> only stores bytes (8bit) with a String class which stores words (32
> bit). It is a bit a waste in terms of space but conceptually it would
> be straightforward. Space measurement has shown that there are not all
> that many strings in Cuis. The major part is taken by bitmaps.
>
> I just have to figure out how to work with these
> variableByteSubclasses with which I have not done much in the past.
>
> --Hannes
>
>
>
>
>
> On 2/1/13, Germán Arduino <[hidden email]> wrote:
>> Thanks Hannes, this is very useful to me.
>>
>> My next step in porting stuff is polish WebClient and, between other
>> things, Unicode is an issue.
>>
>> Germán.
>>
>> 2013/2/1 H. Hirzel <[hidden email]>:
>>> Thank you Juan,
>>> for adding the Unicode fix so that pasting text through the clipboard
>>> does not silently loose characters. More things like this (including
>>> comments) later.
>>>
>>> I have realized that what I wrote earlier is wrong. Cuis reads and
>>> saves files in ISO8859-15 by default and not with Unicode. However it
>>> is not too difficult to read and write a Unicode file.
>>>
>>> I have started some notes on this here
>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md
>>>
>>> Regards
>>> Hannes
>>>
>>> On 1/23/13, Juan Vuletich <[hidden email]> wrote:
>>>> Thanks Hannes, just integrated this.
>>>>
>>>> Cheers,
>>>> Juan Vuletich
>>>>
>>>> H. Hirzel wrote:
>>>>> The attached change set prevents Cuis from silently ignoring
>>>>> characters which are not in ISO 8859-15.
>>>>>
>>>>> For example if you paste a text snippet which contains the letter
>>>>> Omega (Ω) into a TextWindow it is displayed as &#937;
>>>>>
>>>>> The part which does it the other way round is not included.
>>>>>
>>>>> --Hannes
>>>>>
>>>>>
>>>>>
>>>>> On 1/22/13, H. Hirzel <[hidden email]> wrote:
>>>>>
>>>>>> Hello Germán
>>>>>>
>>>>>> On 1/22/13, Germán Arduino <[hidden email]> wrote:
>>>>>>
>>>>>>> Nice if you will develop the needed code!
>>>>>>>
>>>>>>> The first need I have is on the methods of Swazoo that I commented in
>>>>>>> other mail, but I think that is more simple, only that I don't was
>>>>>>> aware of the already inplace support in Cuis itself.
>>>>>>>
>>>>>> Yes, that took me as well some time to find out that Cuis indeed has
>>>>>> some limited Unicode support.
>>>>>>
>>>>>> Juan originally wrote that Cuis had dropped Unicode support.
>>>>>>
>>>>>> When I have a look at Cuis from outside I cannot say that it is the
>>>>>> case as Cuis consumes and writes UFT8 text files. Unicode text
>>>>>> snippets pasted through the clipboard into a Cuis TextEditor also pass
>>>>>> in well. The only limitation is that internally it only handles the
>>>>>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>>>>> And if I work in a Cuis workspace  with
>>>>>>
>>>>>>     nn asCharacter
>>>>>>
>>>>>> where nn is an Integer
>>>>>>
>>>>>>    nn must belong to ISO_8859-15
>>>>>>
>>>>>>
>>>>>> ISO_8859-15 is good for most European languages. If we would have an
>>>>>> Add-On to cater for occasional other characters of Unicode which do
>>>>>> not fall into the set covered by ISO_8859-15 that would make UTF8 text
>>>>>> file processing with Cuis safe.
>>>>>>
>>>>>>
>>>>>> --Hannes
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Germàn.
>>>>>>>
>>>>>>> 2013/1/22 H. Hirzel <[hidden email]>:
>>>>>>>
>>>>>>>> Hello Germán and Juan
>>>>>>>>
>>>>>>>> As we have seen we can say that Cuis handles Unicode to a certain
>>>>>>>> limited extent.
>>>>>>>>
>>>>>>>> I will post summary a writeup of what I know about it later. I am
>>>>>>>> interested in working/contributing to an add-on which loads Unicode
>>>>>>>> support into Cuis.
>>>>>>>>
>>>>>>>> For general work I need
>>>>>>>>
>>>>>>>> a)
>>>>>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>>>>>>>> the majority of the content characters will fall into the
>>>>>>>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>>>>>>> range. So it is fine if the other characters are rendered as \unnn or
>>>>>>>> &#nnn;
>>>>>>>>
>>>>>>>> b)
>>>>>>>> Another more rewarding put maybe more difficult way  would be to
>>>>>>>> replace the String class with a class which handles 16bit characters
>>>>>>>> instead of 8 bit characters. In terms of structure all would remain
>>>>>>>> the same. Characters would be 16bit like in Java.
>>>>>>>>
>>>>>>>>
>>>>>>>> This will come later. At the moment I am working on ContentPack
>>>>>>>> version 2 which will run on Cuis, Squeak and Pharo.
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>>
>>>>>>>> --Hannes
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>>>>>>>
>>>>>>>>>> Thanks for the comments Hannes / Juan:
>>>>>>>>>>
>>>>>>>>>> I will look into it when have time, or if you prefer Hannes and
>>>>>>>>>> want
>>>>>>>>>> to help I will integrate it when finish with Aida.
>>>>>>>>>>
>>>>>>>>>> Germán.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>>>>>>>>
>>>>>>>>>>> Hi Germán,
>>>>>>>>>>>
>>>>>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>>>>>>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>>>>>>>> alphabets).
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Juan Vuletich
>>>>>>>>>>>
>>>>>>>>>>> Germán Arduino wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi:
>>>>>>>>>>>>
>>>>>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>>>>>>>>>>>> all
>>>>>>>>>>>> tests green are ready to install.
>>>>>>>>>>>>
>>>>>>>>>>>> The changes I did in Swazoo are:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>> ......
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Cuis mailing list
>>>>>>>> [hidden email]
>>>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sincerely,
>>>>>>> Germán Arduino
>>>>>>> about.me/garduino
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Cuis mailing list
>>>>>>> [hidden email]
>>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>>
>>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> Cuis mailing list
>>>>>> [hidden email]
>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Cuis mailing list
>>>> [hidden email]
>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> [hidden email]
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Casey Ransberger-2
In reply to this post by Hannes Hirzel
This is cool. Good start. Someday I want to be able to have a class called 無 :D

On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <[hidden email]> wrote:
The attached change set prevents Cuis from silently ignoring
characters which are not in ISO 8859-15.

For example if you paste a text snippet which contains the letter
Omega (Ω) into a TextWindow it is displayed as &#937;

The part which does it the other way round is not included.

--Hannes



On 1/22/13, H. Hirzel <[hidden email]> wrote:
> Hello Germán
>
> On 1/22/13, Germán Arduino <[hidden email]> wrote:
>> Nice if you will develop the needed code!
>>
>> The first need I have is on the methods of Swazoo that I commented in
>> other mail, but I think that is more simple, only that I don't was
>> aware of the already inplace support in Cuis itself.
>
> Yes, that took me as well some time to find out that Cuis indeed has
> some limited Unicode support.
>
> Juan originally wrote that Cuis had dropped Unicode support.
>
> When I have a look at Cuis from outside I cannot say that it is the
> case as Cuis consumes and writes UFT8 text files. Unicode text
> snippets pasted through the clipboard into a Cuis TextEditor also pass
> in well. The only limitation is that internally it only handles the
> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
> And if I work in a Cuis workspace  with
>
>     nn asCharacter
>
> where nn is an Integer
>
>    nn must belong to ISO_8859-15
>
>
> ISO_8859-15 is good for most European languages. If we would have an
> Add-On to cater for occasional other characters of Unicode which do
> not fall into the set covered by ISO_8859-15 that would make UTF8 text
> file processing with Cuis safe.
>
>
> --Hannes
>
>
>>
>> Germàn.
>>
>> 2013/1/22 H. Hirzel <[hidden email]>:
>>> Hello Germán and Juan
>>>
>>> As we have seen we can say that Cuis handles Unicode to a certain
>>> limited extent.
>>>
>>> I will post summary a writeup of what I know about it later. I am
>>> interested in working/contributing to an add-on which loads Unicode
>>> support into Cuis.
>>>
>>> For general work I need
>>>
>>> a)
>>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>>> the majority of the content characters will fall into the
>>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>> range. So it is fine if the other characters are rendered as \unnn or
>>> &#nnn;
>>>
>>> b)
>>> Another more rewarding put maybe more difficult way  would be to
>>> replace the String class with a class which handles 16bit characters
>>> instead of 8 bit characters. In terms of structure all would remain
>>> the same. Characters would be 16bit like in Java.
>>>
>>>
>>> This will come later. At the moment I am working on ContentPack
>>> version 2 which will run on Cuis, Squeak and Pharo.
>>>
>>> Kind regards
>>>
>>> --Hannes
>>>
>>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>>> Thanks for the comments Hannes / Juan:
>>>>>
>>>>> I will look into it when have time, or if you prefer Hannes and want
>>>>> to help I will integrate it when finish with Aida.
>>>>>
>>>>> Germán.
>>>>>
>>>>>
>>>>>
>>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>>>> Hi Germán,
>>>>>>
>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for
>>>>>> the
>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>>> alphabets).
>>>>>>
>>>>>> Cheers,
>>>>>> Juan Vuletich
>>>>>>
>>>>>> Germán Arduino wrote:
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all
>>>>>>> tests green are ready to install.
>>>>>>>
>>>>>>> The changes I did in Swazoo are:
>>>>>>>
>>>>>>>
>>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>>
>>> ......
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> [hidden email]
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>>
>>
>> --
>> Sincerely,
>> Germán Arduino
>> about.me/garduino
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org




--
Casey Ransberger

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
Hello all

In the meantime I am investigating how to construct a small library
which works with WideCharacters and WideStrings and the FileStream and
UTF8Converter which deals with it.

As a start I filed out String and Character and changed the names and
class references in it to WideString and WideCharacter. I now can
create Unicode strings in Cuis. Probably I'll simplify both
WideCharacter and WideString in order to be able to focus more on the
problem as such and learn how to implement it in a simple and
straightforward way. The Unicode-Add-On library then may serve as a
prerequisite for loading WebClient. Germán Arduino and I have to
figure out what actually is needed.

Helpful to understand how WideCharacters work was to have a look at
the class ColorArray.
It only have 4 methods.

The subclass definition is special

ArrayedCollection variableWordSubclass: #ColorArray
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'Collections-Arrayed'

Using
#variableWordSubclass:
instead of the regular
#subClass:

means that the an array of 32bit integers is made available to work with.

A Color is similar to a Unicode character in the sense that an
instance of the class Color can be completely described with an 32 bit
integer. So internally the class ColorArray does not actually store
instances of Color though it is made to appear so as seen from
outside.

When I want to access a color in aColorArray I do
   aColorArray at: index

and the aColorArray actually internally accesses a 32bit integer (= a
word) and converts it to aColor by asking class Integer to do it

Integer>>
  asColorOfDepth: d
        "Return a color value representing the receiver as color of the given depth"
        ^Color colorFromPixelValue: self depth: d

Juan once wrote out that he left out Unicode because he thought it is
'too complicated'. Looking at the implementation in Squeak I think
things could be done differently. It depends on what is actually
needed. Reviewing the code is surely a good thing. At the moment I'd
like to go for a relatively thin layer to make web application porting
straightforward.


Regards
Hannes


On 2/4/13, Casey Ransberger <[hidden email]> wrote:

> This is cool. Good start. Someday I want to be able to have a class called
> 無
> :D
>
> On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <[hidden email]> wrote:
>
>> The attached change set prevents Cuis from silently ignoring
>> characters which are not in ISO 8859-15.
>>
>> For example if you paste a text snippet which contains the letter
>> Omega (Ω) into a TextWindow it is displayed as &#937;
>>
>> The part which does it the other way round is not included.
>>
>> --Hannes
>>
>>
>>
>> On 1/22/13, H. Hirzel <[hidden email]> wrote:
>> > Hello Germán
>> >
>> > On 1/22/13, Germán Arduino <[hidden email]> wrote:
>> >> Nice if you will develop the needed code!
>> >>
>> >> The first need I have is on the methods of Swazoo that I commented in
>> >> other mail, but I think that is more simple, only that I don't was
>> >> aware of the already inplace support in Cuis itself.
>> >
>> > Yes, that took me as well some time to find out that Cuis indeed has
>> > some limited Unicode support.
>> >
>> > Juan originally wrote that Cuis had dropped Unicode support.
>> >
>> > When I have a look at Cuis from outside I cannot say that it is the
>> > case as Cuis consumes and writes UFT8 text files. Unicode text
>> > snippets pasted through the clipboard into a Cuis TextEditor also pass
>> > in well. The only limitation is that internally it only handles the
>> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>> > And if I work in a Cuis workspace  with
>> >
>> >     nn asCharacter
>> >
>> > where nn is an Integer
>> >
>> >    nn must belong to ISO_8859-15
>> >
>> >
>> > ISO_8859-15 is good for most European languages. If we would have an
>> > Add-On to cater for occasional other characters of Unicode which do
>> > not fall into the set covered by ISO_8859-15 that would make UTF8 text
>> > file processing with Cuis safe.
>> >
>> >
>> > --Hannes
>> >
>> >
>> >>
>> >> Germàn.
>> >>
>> >> 2013/1/22 H. Hirzel <[hidden email]>:
>> >>> Hello Germán and Juan
>> >>>
>> >>> As we have seen we can say that Cuis handles Unicode to a certain
>> >>> limited extent.
>> >>>
>> >>> I will post summary a writeup of what I know about it later. I am
>> >>> interested in working/contributing to an add-on which loads Unicode
>> >>> support into Cuis.
>> >>>
>> >>> For general work I need
>> >>>
>> >>> a)
>> >>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>> >>> the majority of the content characters will fall into the
>> >>>   https://de.wikipedia.org/wiki/ISO_8859-15
>> >>> range. So it is fine if the other characters are rendered as \unnn or
>> >>> &#nnn;
>> >>>
>> >>> b)
>> >>> Another more rewarding put maybe more difficult way  would be to
>> >>> replace the String class with a class which handles 16bit characters
>> >>> instead of 8 bit characters. In terms of structure all would remain
>> >>> the same. Characters would be 16bit like in Java.
>> >>>
>> >>>
>> >>> This will come later. At the moment I am working on ContentPack
>> >>> version 2 which will run on Cuis, Squeak and Pharo.
>> >>>
>> >>> Kind regards
>> >>>
>> >>> --Hannes
>> >>>
>> >>>> 2013/1/22 Germán Arduino <[hidden email]>:
>> >>>>> Thanks for the comments Hannes / Juan:
>> >>>>>
>> >>>>> I will look into it when have time, or if you prefer Hannes and
>> >>>>> want
>> >>>>> to help I will integrate it when finish with Aida.
>> >>>>>
>> >>>>> Germán.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>> >>>>>> Hi Germán,
>> >>>>>>
>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>> >>>>>> for
>> >>>>>> the
>> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>> >>>>>> alphabets).
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>> Juan Vuletich
>> >>>>>>
>> >>>>>> Germán Arduino wrote:
>> >>>>>>>
>> >>>>>>> Hi:
>> >>>>>>>
>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>> >>>>>>> all
>> >>>>>>> tests green are ready to install.
>> >>>>>>>
>> >>>>>>> The changes I did in Swazoo are:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> - Avoid Unicode support that don't exist in Cuis
>> >>>>>>>
>> >>> ......
>> >>>
>> >>> _______________________________________________
>> >>> Cuis mailing list
>> >>> [hidden email]
>> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>> >>
>> >>
>> >>
>> >> --
>> >> Sincerely,
>> >> Germán Arduino
>> >> about.me/garduino
>> >>
>> >> _______________________________________________
>> >> Cuis mailing list
>> >> [hidden email]
>> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>> >>
>> >
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>>
>
>
> --
> Casey Ransberger
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
P.S.
the necessity for a Unicode solution becomes visible for example with
the README.md file

https://github.com/hhzl/Cuis-WebClient/blob/master/README.md

" Germán Arduino" appears proplerly there whereas in Cuis it is
displayed as the attached screen shot shows.



On 2/5/13, H. Hirzel <[hidden email]> wrote:

> Hello all
>
> In the meantime I am investigating how to construct a small library
> which works with WideCharacters and WideStrings and the FileStream and
> UTF8Converter which deals with it.
>
> As a start I filed out String and Character and changed the names and
> class references in it to WideString and WideCharacter. I now can
> create Unicode strings in Cuis. Probably I'll simplify both
> WideCharacter and WideString in order to be able to focus more on the
> problem as such and learn how to implement it in a simple and
> straightforward way. The Unicode-Add-On library then may serve as a
> prerequisite for loading WebClient. Germán Arduino and I have to
> figure out what actually is needed.
>
> Helpful to understand how WideCharacters work was to have a look at
> the class ColorArray.
> It only have 4 methods.
>
> The subclass definition is special
>
> ArrayedCollection variableWordSubclass: #ColorArray
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Collections-Arrayed'
>
> Using
> #variableWordSubclass:
> instead of the regular
> #subClass:
>
> means that the an array of 32bit integers is made available to work with.
>
> A Color is similar to a Unicode character in the sense that an
> instance of the class Color can be completely described with an 32 bit
> integer. So internally the class ColorArray does not actually store
> instances of Color though it is made to appear so as seen from
> outside.
>
> When I want to access a color in aColorArray I do
>    aColorArray at: index
>
> and the aColorArray actually internally accesses a 32bit integer (= a
> word) and converts it to aColor by asking class Integer to do it
>
> Integer>>
>   asColorOfDepth: d
> "Return a color value representing the receiver as color of the given
> depth"
> ^Color colorFromPixelValue: self depth: d
>
> Juan once wrote out that he left out Unicode because he thought it is
> 'too complicated'. Looking at the implementation in Squeak I think
> things could be done differently. It depends on what is actually
> needed. Reviewing the code is surely a good thing. At the moment I'd
> like to go for a relatively thin layer to make web application porting
> straightforward.
>
>
> Regards
> Hannes
>
>
> On 2/4/13, Casey Ransberger <[hidden email]> wrote:
>> This is cool. Good start. Someday I want to be able to have a class
>> called
>> 無
>> :D
>>
>> On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <[hidden email]>
>> wrote:
>>
>>> The attached change set prevents Cuis from silently ignoring
>>> characters which are not in ISO 8859-15.
>>>
>>> For example if you paste a text snippet which contains the letter
>>> Omega (Ω) into a TextWindow it is displayed as &#937;
>>>
>>> The part which does it the other way round is not included.
>>>
>>> --Hannes
>>>
>>>
>>>
>>> On 1/22/13, H. Hirzel <[hidden email]> wrote:
>>> > Hello Germán
>>> >
>>> > On 1/22/13, Germán Arduino <[hidden email]> wrote:
>>> >> Nice if you will develop the needed code!
>>> >>
>>> >> The first need I have is on the methods of Swazoo that I commented in
>>> >> other mail, but I think that is more simple, only that I don't was
>>> >> aware of the already inplace support in Cuis itself.
>>> >
>>> > Yes, that took me as well some time to find out that Cuis indeed has
>>> > some limited Unicode support.
>>> >
>>> > Juan originally wrote that Cuis had dropped Unicode support.
>>> >
>>> > When I have a look at Cuis from outside I cannot say that it is the
>>> > case as Cuis consumes and writes UFT8 text files. Unicode text
>>> > snippets pasted through the clipboard into a Cuis TextEditor also pass
>>> > in well. The only limitation is that internally it only handles the
>>> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>> > And if I work in a Cuis workspace  with
>>> >
>>> >     nn asCharacter
>>> >
>>> > where nn is an Integer
>>> >
>>> >    nn must belong to ISO_8859-15
>>> >
>>> >
>>> > ISO_8859-15 is good for most European languages. If we would have an
>>> > Add-On to cater for occasional other characters of Unicode which do
>>> > not fall into the set covered by ISO_8859-15 that would make UTF8 text
>>> > file processing with Cuis safe.
>>> >
>>> >
>>> > --Hannes
>>> >
>>> >
>>> >>
>>> >> Germàn.
>>> >>
>>> >> 2013/1/22 H. Hirzel <[hidden email]>:
>>> >>> Hello Germán and Juan
>>> >>>
>>> >>> As we have seen we can say that Cuis handles Unicode to a certain
>>> >>> limited extent.
>>> >>>
>>> >>> I will post summary a writeup of what I know about it later. I am
>>> >>> interested in working/contributing to an add-on which loads Unicode
>>> >>> support into Cuis.
>>> >>>
>>> >>> For general work I need
>>> >>>
>>> >>> a)
>>> >>> an add-on so that Cuis can process arbitrary UFT8 text files.
>>> >>> However
>>> >>> the majority of the content characters will fall into the
>>> >>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>> >>> range. So it is fine if the other characters are rendered as \unnn
>>> >>> or
>>> >>> &#nnn;
>>> >>>
>>> >>> b)
>>> >>> Another more rewarding put maybe more difficult way  would be to
>>> >>> replace the String class with a class which handles 16bit characters
>>> >>> instead of 8 bit characters. In terms of structure all would remain
>>> >>> the same. Characters would be 16bit like in Java.
>>> >>>
>>> >>>
>>> >>> This will come later. At the moment I am working on ContentPack
>>> >>> version 2 which will run on Cuis, Squeak and Pharo.
>>> >>>
>>> >>> Kind regards
>>> >>>
>>> >>> --Hannes
>>> >>>
>>> >>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>> >>>>> Thanks for the comments Hannes / Juan:
>>> >>>>>
>>> >>>>> I will look into it when have time, or if you prefer Hannes and
>>> >>>>> want
>>> >>>>> to help I will integrate it when finish with Aida.
>>> >>>>>
>>> >>>>> Germán.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>> >>>>>> Hi Germán,
>>> >>>>>>
>>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>>> >>>>>> for
>>> >>>>>> the
>>> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>> >>>>>> alphabets).
>>> >>>>>>
>>> >>>>>> Cheers,
>>> >>>>>> Juan Vuletich
>>> >>>>>>
>>> >>>>>> Germán Arduino wrote:
>>> >>>>>>>
>>> >>>>>>> Hi:
>>> >>>>>>>
>>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>>> >>>>>>> all
>>> >>>>>>> tests green are ready to install.
>>> >>>>>>>
>>> >>>>>>> The changes I did in Swazoo are:
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> - Avoid Unicode support that don't exist in Cuis
>>> >>>>>>>
>>> >>> ......
>>> >>>
>>> >>> _______________________________________________
>>> >>> Cuis mailing list
>>> >>> [hidden email]
>>> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Sincerely,
>>> >> Germán Arduino
>>> >> about.me/garduino
>>> >>
>>> >> _______________________________________________
>>> >> Cuis mailing list
>>> >> [hidden email]
>>> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>> >>
>>> >
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> [hidden email]
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>
>>>
>>
>>
>> --
>> Casey Ransberger
>>
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org

ScreenShotWithUTF8displayProblem.png (70K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
I forgot to mention the explanation

The README.md as used by github file is encoded in UFT8
http://en.wikipedia.org/wiki/UTF8 wheras the Cuis File List browser
assumes that text files are encoded in ISO8859-15.

http://en.wikipedia.org/wiki/ISO/IEC_8859-15

This actually calls for a preference to tell Cuis how to interpret text files

- UTF8 or
- ISO8859-15


On 2/5/13, H. Hirzel <[hidden email]> wrote:

> P.S.
> the necessity for a Unicode solution becomes visible for example with
> the README.md file
>
> https://github.com/hhzl/Cuis-WebClient/blob/master/README.md
>
> " Germán Arduino" appears proplerly there whereas in Cuis it is
> displayed as the attached screen shot shows.
>
>
>
> On 2/5/13, H. Hirzel <[hidden email]> wrote:
>> Hello all
>>
>> In the meantime I am investigating how to construct a small library
>> which works with WideCharacters and WideStrings and the FileStream and
>> UTF8Converter which deals with it.
>>
>> As a start I filed out String and Character and changed the names and
>> class references in it to WideString and WideCharacter. I now can
>> create Unicode strings in Cuis. Probably I'll simplify both
>> WideCharacter and WideString in order to be able to focus more on the
>> problem as such and learn how to implement it in a simple and
>> straightforward way. The Unicode-Add-On library then may serve as a
>> prerequisite for loading WebClient. Germán Arduino and I have to
>> figure out what actually is needed.
>>
>> Helpful to understand how WideCharacters work was to have a look at
>> the class ColorArray.
>> It only have 4 methods.
>>
>> The subclass definition is special
>>
>> ArrayedCollection variableWordSubclass: #ColorArray
>> instanceVariableNames: ''
>> classVariableNames: ''
>> poolDictionaries: ''
>> category: 'Collections-Arrayed'
>>
>> Using
>> #variableWordSubclass:
>> instead of the regular
>> #subClass:
>>
>> means that the an array of 32bit integers is made available to work with.
>>
>> A Color is similar to a Unicode character in the sense that an
>> instance of the class Color can be completely described with an 32 bit
>> integer. So internally the class ColorArray does not actually store
>> instances of Color though it is made to appear so as seen from
>> outside.
>>
>> When I want to access a color in aColorArray I do
>>    aColorArray at: index
>>
>> and the aColorArray actually internally accesses a 32bit integer (= a
>> word) and converts it to aColor by asking class Integer to do it
>>
>> Integer>>
>>   asColorOfDepth: d
>> "Return a color value representing the receiver as color of the given
>> depth"
>> ^Color colorFromPixelValue: self depth: d
>>
>> Juan once wrote out that he left out Unicode because he thought it is
>> 'too complicated'. Looking at the implementation in Squeak I think
>> things could be done differently. It depends on what is actually
>> needed. Reviewing the code is surely a good thing. At the moment I'd
>> like to go for a relatively thin layer to make web application porting
>> straightforward.
>>
>>
>> Regards
>> Hannes
>>
>>
>> On 2/4/13, Casey Ransberger <[hidden email]> wrote:
>>> This is cool. Good start. Someday I want to be able to have a class
>>> called
>>> 無
>>> :D
>>>
>>> On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <[hidden email]>
>>> wrote:
>>>
>>>> The attached change set prevents Cuis from silently ignoring
>>>> characters which are not in ISO 8859-15.
>>>>
>>>> For example if you paste a text snippet which contains the letter
>>>> Omega (Ω) into a TextWindow it is displayed as &#937;
>>>>
>>>> The part which does it the other way round is not included.
>>>>
>>>> --Hannes
>>>>
>>>>
>>>>
>>>> On 1/22/13, H. Hirzel <[hidden email]> wrote:
>>>> > Hello Germán
>>>> >
>>>> > On 1/22/13, Germán Arduino <[hidden email]> wrote:
>>>> >> Nice if you will develop the needed code!
>>>> >>
>>>> >> The first need I have is on the methods of Swazoo that I commented
>>>> >> in
>>>> >> other mail, but I think that is more simple, only that I don't was
>>>> >> aware of the already inplace support in Cuis itself.
>>>> >
>>>> > Yes, that took me as well some time to find out that Cuis indeed has
>>>> > some limited Unicode support.
>>>> >
>>>> > Juan originally wrote that Cuis had dropped Unicode support.
>>>> >
>>>> > When I have a look at Cuis from outside I cannot say that it is the
>>>> > case as Cuis consumes and writes UFT8 text files. Unicode text
>>>> > snippets pasted through the clipboard into a Cuis TextEditor also
>>>> > pass
>>>> > in well. The only limitation is that internally it only handles the
>>>> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>>> > And if I work in a Cuis workspace  with
>>>> >
>>>> >     nn asCharacter
>>>> >
>>>> > where nn is an Integer
>>>> >
>>>> >    nn must belong to ISO_8859-15
>>>> >
>>>> >
>>>> > ISO_8859-15 is good for most European languages. If we would have an
>>>> > Add-On to cater for occasional other characters of Unicode which do
>>>> > not fall into the set covered by ISO_8859-15 that would make UTF8
>>>> > text
>>>> > file processing with Cuis safe.
>>>> >
>>>> >
>>>> > --Hannes
>>>> >
>>>> >
>>>> >>
>>>> >> Germàn.
>>>> >>
>>>> >> 2013/1/22 H. Hirzel <[hidden email]>:
>>>> >>> Hello Germán and Juan
>>>> >>>
>>>> >>> As we have seen we can say that Cuis handles Unicode to a certain
>>>> >>> limited extent.
>>>> >>>
>>>> >>> I will post summary a writeup of what I know about it later. I am
>>>> >>> interested in working/contributing to an add-on which loads Unicode
>>>> >>> support into Cuis.
>>>> >>>
>>>> >>> For general work I need
>>>> >>>
>>>> >>> a)
>>>> >>> an add-on so that Cuis can process arbitrary UFT8 text files.
>>>> >>> However
>>>> >>> the majority of the content characters will fall into the
>>>> >>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>>> >>> range. So it is fine if the other characters are rendered as \unnn
>>>> >>> or
>>>> >>> &#nnn;
>>>> >>>
>>>> >>> b)
>>>> >>> Another more rewarding put maybe more difficult way  would be to
>>>> >>> replace the String class with a class which handles 16bit
>>>> >>> characters
>>>> >>> instead of 8 bit characters. In terms of structure all would remain
>>>> >>> the same. Characters would be 16bit like in Java.
>>>> >>>
>>>> >>>
>>>> >>> This will come later. At the moment I am working on ContentPack
>>>> >>> version 2 which will run on Cuis, Squeak and Pharo.
>>>> >>>
>>>> >>> Kind regards
>>>> >>>
>>>> >>> --Hannes
>>>> >>>
>>>> >>>> 2013/1/22 Germán Arduino <[hidden email]>:
>>>> >>>>> Thanks for the comments Hannes / Juan:
>>>> >>>>>
>>>> >>>>> I will look into it when have time, or if you prefer Hannes and
>>>> >>>>> want
>>>> >>>>> to help I will integrate it when finish with Aida.
>>>> >>>>>
>>>> >>>>> Germán.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> 2013/1/21 Juan Vuletich <[hidden email]>:
>>>> >>>>>> Hi Germán,
>>>> >>>>>>
>>>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>>>> >>>>>> for
>>>> >>>>>> the
>>>> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>> >>>>>> alphabets).
>>>> >>>>>>
>>>> >>>>>> Cheers,
>>>> >>>>>> Juan Vuletich
>>>> >>>>>>
>>>> >>>>>> Germán Arduino wrote:
>>>> >>>>>>>
>>>> >>>>>>> Hi:
>>>> >>>>>>>
>>>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>>>> >>>>>>> all
>>>> >>>>>>> tests green are ready to install.
>>>> >>>>>>>
>>>> >>>>>>> The changes I did in Swazoo are:
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>> >>>>>>>
>>>> >>> ......
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> Cuis mailing list
>>>> >>> [hidden email]
>>>> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Sincerely,
>>>> >> Germán Arduino
>>>> >> about.me/garduino
>>>> >>
>>>> >> _______________________________________________
>>>> >> Cuis mailing list
>>>> >> [hidden email]
>>>> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>> >>
>>>> >
>>>>
>>>> _______________________________________________
>>>> Cuis mailing list
>>>> [hidden email]
>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Casey Ransberger
>>>
>>
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

KenDickey
In reply to this post by Hannes Hirzel
On Tue, 5 Feb 2013 09:20:24 +0000
"H. Hirzel" <[hidden email]> wrote:

> Hello all
>
> In the meantime I am investigating how to construct a small library
> which works with WideCharacters and WideStrings and the FileStream and
> UTF8Converter which deals with it.

Hannes,

Indeed Unicode is moby complex.

        http://www.unicode.org/versions/Unicode6.2.0/

I don't know if it helps, but Scheme has probably the minimal defined Unicode support -- basically read/write, code points, comparisons, and up/down-casing. The scheme standards group has argued Unicode implementation features for years. [See 7th draft]
        http://scheme-reports.org/2012/working-group-1.html

Chibi-Scheme is a bytecode implementation written in C which implements this support.

        https://code.google.com/p/chibi-scheme/

This might be a stretch, but the implementation strategy has been gone over by many eyeballs.

$0.02,
-KenD

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
-KenD
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
Ken

Having a comparison of a specification/implementation of a simple
Unicode layer in another language is helpful.

https://code.google.com/p/chibi-scheme/source/browse/lib/scheme/char.sld  [1]

So my aim is at doing something similar in the sense that I want to
leave Cuis 4.1 more or less as is (maybe minor corrections) and then
have an Add-On for more Unicode support.

Thank you

--Hannes



[1]
(define-library (scheme char)
(import (scheme base))
(cond-expand
(full-unicode
(import (chibi char-set full)
(chibi char-set base)
(chibi iset base))
(include "char/full.scm")
(include "char/case-offsets.scm"))
(else
(include "char/ascii.scm")
(import
(only (chibi)
string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>?
char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>?
char-alphabetic? char-lower-case? char-numeric?
char-upper-case? char-whitespace? digit-value
char-upcase char-downcase))))
(include "digit-value.scm")
(export
char-alphabetic? char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>?
char-downcase char-foldcase char-lower-case? char-numeric?
char-upcase char-upper-case? char-whitespace? digit-value
string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>?
string-downcase string-foldcase string-upcase))

On 2/6/13, Ken Dickey <[hidden email]> wrote:

> On Tue, 5 Feb 2013 09:20:24 +0000
> "H. Hirzel" <[hidden email]> wrote:
>
>> Hello all
>>
>> In the meantime I am investigating how to construct a small library
>> which works with WideCharacters and WideStrings and the FileStream and
>> UTF8Converter which deals with it.
>
> Hannes,
>
> Indeed Unicode is moby complex.
>
> http://www.unicode.org/versions/Unicode6.2.0/
>
> I don't know if it helps, but Scheme has probably the minimal defined
> Unicode support -- basically read/write, code points, comparisons, and
> up/down-casing. The scheme standards group has argued Unicode implementation
> features for years. [See 7th draft]
> http://scheme-reports.org/2012/working-group-1.html
>
> Chibi-Scheme is a bytecode implementation written in C which implements this
> support.
>
> https://code.google.com/p/chibi-scheme/
>
> This might be a stretch, but the implementation strategy has been gone over
> by many eyeballs.
>
> $0.02,
> -KenD
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Angel Java Lopez
Hi people!

I just found:

It's not clear to me (I'm not a Smalltalker/Squeaker/Cuiser):

Ohsima work, is a change in the Squeak VM? or in String class using pure Smalltalk/Squeak?
Why not is that work included in Cuis? It cannot be ported?

Angel "Java" Lopez
@ajlopez

On Wed, Feb 6, 2013 at 8:24 AM, H. Hirzel <[hidden email]> wrote:
Ken

Having a comparison of a specification/implementation of a simple
Unicode layer in another language is helpful.

https://code.google.com/p/chibi-scheme/source/browse/lib/scheme/char.sld  [1]

So my aim is at doing something similar in the sense that I want to
leave Cuis 4.1 more or less as is (maybe minor corrections) and then
have an Add-On for more Unicode support.

Thank you

--Hannes



[1]
(define-library (scheme char)
(import (scheme base))
(cond-expand
(full-unicode
(import (chibi char-set full)
(chibi char-set base)
(chibi iset base))
(include "char/full.scm")
(include "char/case-offsets.scm"))
(else
(include "char/ascii.scm")
(import
(only (chibi)
string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>?
char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>?
char-alphabetic? char-lower-case? char-numeric?
char-upper-case? char-whitespace? digit-value
char-upcase char-downcase))))
(include "digit-value.scm")
(export
char-alphabetic? char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>?
char-downcase char-foldcase char-lower-case? char-numeric?
char-upcase char-upper-case? char-whitespace? digit-value
string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>?
string-downcase string-foldcase string-upcase))

On 2/6/13, Ken Dickey <[hidden email]> wrote:
> On Tue, 5 Feb 2013 09:20:24 +0000
> "H. Hirzel" <[hidden email]> wrote:
>
>> Hello all
>>
>> In the meantime I am investigating how to construct a small library
>> which works with WideCharacters and WideStrings and the FileStream and
>> UTF8Converter which deals with it.
>
> Hannes,
>
> Indeed Unicode is moby complex.
>
>       http://www.unicode.org/versions/Unicode6.2.0/
>
> I don't know if it helps, but Scheme has probably the minimal defined
> Unicode support -- basically read/write, code points, comparisons, and
> up/down-casing. The scheme standards group has argued Unicode implementation
> features for years. [See 7th draft]
>       http://scheme-reports.org/2012/working-group-1.html
>
> Chibi-Scheme is a bytecode implementation written in C which implements this
> support.
>
>       https://code.google.com/p/chibi-scheme/
>
> This might be a stretch, but the implementation strategy has been gone over
> by many eyeballs.
>
> $0.02,
> -KenD
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Hannes Hirzel
Hello Angel

On 2/6/13, Angel Java Lopez <[hidden email]> wrote:
> Hi people!
>
> I just found:
> http://wiki.squeak.org/squeak/857 Unicode at Squeak
> http://www.is.titech.ac.jp/~ohshima/squeak/
> http://www.is.titech.ac.jp/~ohshima/squeak/squeak-multilingual-e.html(pending,
> to read)

Thank you for reminding us of these documents. They contain
information about the implementation of Unicode in Squeak 3.8 which
was release in 2005.

I have added the references you sent to the UnicodeNotes.md document
https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md

> It's not clear to me (I'm not a Smalltalker/Squeaker/Cuiser):
>
> Ohsima work, is a change in the Squeak VM? or in String class using pure
> Smalltalk/Squeak?

It is mainly more Smalltalk code (String, ByteString, WideString,
MultiFileByteStream, TextConverter, UTF8TextConverter, many more
.....) but in addition certain changes had to be made to the virtual
machine. For example the clipboard is now in Unicode (UTF8).

> Why not is that work included in Cuis? It cannot be ported?

That is what we are aiming at here   :-)

The question is what exactly? And how should we adapt/change it? Make
it simpler?

I have started a repository
     https://github.com/hhzl/Cuis-Multilingual-TextConversion

where I copy three classes of Squeak at the moment

    https://github.com/hhzl/Cuis-Multilingual-TextConversion/tree/master/CopiedFromSqueak

Actually I copy only two classes, the abstract class TextConverter I
filed in only the class definition and I am now adding methods one by
one of what I need). Maybe I fold the code later into
        UTF8TextConverter


The reason why it was not ported by Juan is that he wanted to focus on
Morphic and leave out some complex subsystems like Unicode support,
Monticello and others.

The Unicode support in Squeak models 'language'.

For example I Squeak 4.4. we have the TextConverter class refering to
a LanguageEnvironment

TextConverter class>>defaultSystemConverter

        ^LanguageEnvironment defaultSystemConverter
defaultSystemConverter


and then

LanguageEnvironment class>>defaultSystemConverter

        SystemConverterClass ifNil: [SystemConverterClass := self
currentPlatform class systemConverterClass].
        ^ SystemConverterClass new.


which refers to class Locale in the category 'System-Localization'

So the question is what should be adapt.

The current character class in Cuis is 8 bit only. Not that there
couldn't be more as they are integers which are 32 bit but it
restricted on purpose.

What is named String in Cuis is a ByteString in Squeak. Juan has
reworked the Character / String classes considerably. It is a nice
implementation for ISO8859-15 and in some cases surpasses what is in
Squeak. And it is more 'compact' and 'cleaner'. And it has 'hooks' for
Unicode as outlined here

https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md#implementation-in-cuis-41

At the moment I want to focus on a library which when added permits
Cuis4.1 to read and write UFT8 files. This is possible as of now but
not in the File List (see
https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md#implementation-in-cuis-41
with the screen shot here
http://jvuletich.org/pipermail/cuis_jvuletich.org/attachments/20130205/915f4469/attachment-0001.png)

--Hannes


> Angel "Java" Lopez
> @ajlopez
>
> On Wed, Feb 6, 2013 at 8:24 AM, H. Hirzel <[hidden email]> wrote:
>
>> Ken
>>
>> Having a comparison of a specification/implementation of a simple
>> Unicode layer in another language is helpful.
>>
>> https://code.google.com/p/chibi-scheme/source/browse/lib/scheme/char.sld
>> [1]
>>
>> So my aim is at doing something similar in the sense that I want to
>> leave Cuis 4.1 more or less as is (maybe minor corrections) and then
>> have an Add-On for more Unicode support.
>>
>> Thank you
>>
>> --Hannes
>>
>>
>>
>> [1]
>> (define-library (scheme char)
>> (import (scheme base))
>> (cond-expand
>> (full-unicode
>> (import (chibi char-set full)
>> (chibi char-set base)
>> (chibi iset base))
>> (include "char/full.scm")
>> (include "char/case-offsets.scm"))
>> (else
>> (include "char/ascii.scm")
>> (import
>> (only (chibi)
>> string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>?
>> char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>?
>> char-alphabetic? char-lower-case? char-numeric?
>> char-upper-case? char-whitespace? digit-value
>> char-upcase char-downcase))))
>> (include "digit-value.scm")
>> (export
>> char-alphabetic? char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>?
>> char-downcase char-foldcase char-lower-case? char-numeric?
>> char-upcase char-upper-case? char-whitespace? digit-value
>> string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>?
>> string-downcase string-foldcase string-upcase))
>>
>> On 2/6/13, Ken Dickey <[hidden email]> wrote:
>> > On Tue, 5 Feb 2013 09:20:24 +0000
>> > "H. Hirzel" <[hidden email]> wrote:
>> >
>> >> Hello all
>> >>
>> >> In the meantime I am investigating how to construct a small library
>> >> which works with WideCharacters and WideStrings and the FileStream and
>> >> UTF8Converter which deals with it.
>> >
>> > Hannes,
>> >
>> > Indeed Unicode is moby complex.
>> >
>> >       http://www.unicode.org/versions/Unicode6.2.0/
>> >
>> > I don't know if it helps, but Scheme has probably the minimal defined
>> > Unicode support -- basically read/write, code points, comparisons, and
>> > up/down-casing. The scheme standards group has argued Unicode
>> implementation
>> > features for years. [See 7th draft]
>> >       http://scheme-reports.org/2012/working-group-1.html
>> >
>> > Chibi-Scheme is a bytecode implementation written in C which implements
>> this
>> > support.
>> >
>> >       https://code.google.com/p/chibi-scheme/
>> >
>> > This might be a stretch, but the implementation strategy has been gone
>> over
>> > by many eyeballs.
>> >
>> > $0.02,
>> > -KenD
>> >
>> > _______________________________________________
>> > Cuis mailing list
>> > [hidden email]
>> > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>> >
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

KenDickey
On Wed, 6 Feb 2013 12:21:56 +0000
"H. Hirzel" <[hidden email]> wrote:

> The reason why it was not ported by Juan is that he wanted to focus on
> Morphic and leave out some complex subsystems like Unicode support,
> Monticello and others.

I'd just like echo these sentiments. I think moving Morphic forward is the highest value.

IMHO, anything we can do to help and/or not hinder Juan is goodness.

Cheers,
-KenD

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
-KenD
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Juan Vuletich-4
In reply to this post by Angel Java Lopez
Hi Angel,

On 2/6/2013 8:33 AM, Angel Java Lopez wrote:

> Hi people!
>
> I just found:
> http://wiki.squeak.org/squeak/857 Unicode at Squeak
> http://www.is.titech.ac.jp/~ohshima/squeak/ 
> <http://www.is.titech.ac.jp/%7Eohshima/squeak/>
> http://www.is.titech.ac.jp/~ohshima/squeak/squeak-multilingual-e.html 
> <http://www.is.titech.ac.jp/%7Eohshima/squeak/squeak-multilingual-e.html>
> (pending, to read)
>
> It's not clear to me (I'm not a Smalltalker/Squeaker/Cuiser):
>
> Ohsima work, is a change in the Squeak VM? or in String class using
> pure Smalltalk/Squeak?
> Why not is that work included in Cuis? It cannot be ported?
>
> Angel "Java" Lopez
> @ajlopez

 From http://www.jvuletich.org/Cuis/CuisReleaseNotes.html:

"For instance, Cuis also doesn't include Unicode support. The handling
of Unicode characters and strings in Squeak falls in b (too complex), as
the whole system is affected and c (not stable), as bugs are still
arising, even after being used for many years. Besides, as the basic
Character and String were not modified, but new classes for
WideCharacter and WideString were introduced, we can also consider it
falls under a (optional in nature)."

Cheers,
Juan Vuletich

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: About adding a Unicode handling porting layer

Juan Vuletich-4
In reply to this post by KenDickey
Thanks Folks!

Cheers,
Juan Vuletich

On 2/6/2013 10:07 PM, Ken Dickey wrote:

> On Wed, 6 Feb 2013 12:21:56 +0000
> "H. Hirzel"<[hidden email]>  wrote:
>
>> The reason why it was not ported by Juan is that he wanted to focus on
>> Morphic and leave out some complex subsystems like Unicode support,
>> Monticello and others.
> I'd just like echo these sentiments. I think moving Morphic forward is the highest value.
>
> IMHO, anything we can do to help and/or not hinder Juan is goodness.
>
> Cheers,
> -KenD
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>
>
> -----
> Se certifico que el correo no contiene virus.
> Comprobada por AVG - www.avg.es
> Version: 2013.0.2897 / Base de datos de virus: 2639/6086 - Fecha de la version: 06/02/2013
>
>


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
12