Hello Germán and Juan
As we have seen we can say that Cuis handles Unicode to a certain limited extent. I will post summary a writeup of what I know about it later. I am interested in working/contributing to an add-on which loads Unicode support into Cuis. For general work I need a) an add-on so that Cuis can process arbitrary UFT8 text files. However the majority of the content characters will fall into the https://de.wikipedia.org/wiki/ISO_8859-15 range. So it is fine if the other characters are rendered as \unnn or &#nnn; b) Another more rewarding put maybe more difficult way would be to replace the String class with a class which handles 16bit characters instead of 8 bit characters. In terms of structure all would remain the same. Characters would be 16bit like in Java. This will come later. At the moment I am working on ContentPack version 2 which will run on Cuis, Squeak and Pharo. Kind regards --Hannes > 2013/1/22 Germán Arduino <[hidden email]>: >> Thanks for the comments Hannes / Juan: >> >> I will look into it when have time, or if you prefer Hannes and want >> to help I will integrate it when finish with Aida. >> >> Germán. >> >> >> >> 2013/1/21 Juan Vuletich <[hidden email]>: >>> Hi Germán, >>> >>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for the >>> charset it supports (ISO-8859-15, covering nearly all the latin >>> alphabets). >>> >>> Cheers, >>> Juan Vuletich >>> >>> Germán Arduino wrote: >>>> >>>> Hi: >>>> >>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all >>>> tests green are ready to install. >>>> >>>> The changes I did in Swazoo are: >>>> >>>> >>>> - Avoid Unicode support that don't exist in Cuis >>>> _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Nice if you will develop the needed code!
The first need I have is on the methods of Swazoo that I commented in other mail, but I think that is more simple, only that I don't was aware of the already inplace support in Cuis itself. Germàn. 2013/1/22 H. Hirzel <[hidden email]>: > Hello Germán and Juan > > As we have seen we can say that Cuis handles Unicode to a certain > limited extent. > > I will post summary a writeup of what I know about it later. I am > interested in working/contributing to an add-on which loads Unicode > support into Cuis. > > For general work I need > > a) > an add-on so that Cuis can process arbitrary UFT8 text files. However > the majority of the content characters will fall into the > https://de.wikipedia.org/wiki/ISO_8859-15 > range. So it is fine if the other characters are rendered as \unnn or &#nnn; > > b) > Another more rewarding put maybe more difficult way would be to > replace the String class with a class which handles 16bit characters > instead of 8 bit characters. In terms of structure all would remain > the same. Characters would be 16bit like in Java. > > > This will come later. At the moment I am working on ContentPack > version 2 which will run on Cuis, Squeak and Pharo. > > Kind regards > > --Hannes > >> 2013/1/22 Germán Arduino <[hidden email]>: >>> Thanks for the comments Hannes / Juan: >>> >>> I will look into it when have time, or if you prefer Hannes and want >>> to help I will integrate it when finish with Aida. >>> >>> Germán. >>> >>> >>> >>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>> Hi Germán, >>>> >>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for the >>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>> alphabets). >>>> >>>> Cheers, >>>> Juan Vuletich >>>> >>>> Germán Arduino wrote: >>>>> >>>>> Hi: >>>>> >>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all >>>>> tests green are ready to install. >>>>> >>>>> The changes I did in Swazoo are: >>>>> >>>>> >>>>> - Avoid Unicode support that don't exist in Cuis >>>>> > ...... > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org -- Sincerely, Germán Arduino about.me/garduino _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Hello Germán
On 1/22/13, Germán Arduino <[hidden email]> wrote: > Nice if you will develop the needed code! > > The first need I have is on the methods of Swazoo that I commented in > other mail, but I think that is more simple, only that I don't was > aware of the already inplace support in Cuis itself. Yes, that took me as well some time to find out that Cuis indeed has some limited Unicode support. Juan originally wrote that Cuis had dropped Unicode support. When I have a look at Cuis from outside I cannot say that it is the case as Cuis consumes and writes UFT8 text files. Unicode text snippets pasted through the clipboard into a Cuis TextEditor also pass in well. The only limitation is that internally it only handles the code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. And if I work in a Cuis workspace with nn asCharacter where nn is an Integer nn must belong to ISO_8859-15 ISO_8859-15 is good for most European languages. If we would have an Add-On to cater for occasional other characters of Unicode which do not fall into the set covered by ISO_8859-15 that would make UTF8 text file processing with Cuis safe. --Hannes > > Germàn. > > 2013/1/22 H. Hirzel <[hidden email]>: >> Hello Germán and Juan >> >> As we have seen we can say that Cuis handles Unicode to a certain >> limited extent. >> >> I will post summary a writeup of what I know about it later. I am >> interested in working/contributing to an add-on which loads Unicode >> support into Cuis. >> >> For general work I need >> >> a) >> an add-on so that Cuis can process arbitrary UFT8 text files. However >> the majority of the content characters will fall into the >> https://de.wikipedia.org/wiki/ISO_8859-15 >> range. So it is fine if the other characters are rendered as \unnn or >> &#nnn; >> >> b) >> Another more rewarding put maybe more difficult way would be to >> replace the String class with a class which handles 16bit characters >> instead of 8 bit characters. In terms of structure all would remain >> the same. Characters would be 16bit like in Java. >> >> >> This will come later. At the moment I am working on ContentPack >> version 2 which will run on Cuis, Squeak and Pharo. >> >> Kind regards >> >> --Hannes >> >>> 2013/1/22 Germán Arduino <[hidden email]>: >>>> Thanks for the comments Hannes / Juan: >>>> >>>> I will look into it when have time, or if you prefer Hannes and want >>>> to help I will integrate it when finish with Aida. >>>> >>>> Germán. >>>> >>>> >>>> >>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>>> Hi Germán, >>>>> >>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for >>>>> the >>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>>> alphabets). >>>>> >>>>> Cheers, >>>>> Juan Vuletich >>>>> >>>>> Germán Arduino wrote: >>>>>> >>>>>> Hi: >>>>>> >>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all >>>>>> tests green are ready to install. >>>>>> >>>>>> The changes I did in Swazoo are: >>>>>> >>>>>> >>>>>> - Avoid Unicode support that don't exist in Cuis >>>>>> >> ...... >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > > > > -- > Sincerely, > Germán Arduino > about.me/garduino > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
The attached change set prevents Cuis from silently ignoring
characters which are not in ISO 8859-15. For example if you paste a text snippet which contains the letter Omega (Ω) into a TextWindow it is displayed as Ω The part which does it the other way round is not included. --Hannes On 1/22/13, H. Hirzel <[hidden email]> wrote: > Hello Germán > > On 1/22/13, Germán Arduino <[hidden email]> wrote: >> Nice if you will develop the needed code! >> >> The first need I have is on the methods of Swazoo that I commented in >> other mail, but I think that is more simple, only that I don't was >> aware of the already inplace support in Cuis itself. > > Yes, that took me as well some time to find out that Cuis indeed has > some limited Unicode support. > > Juan originally wrote that Cuis had dropped Unicode support. > > When I have a look at Cuis from outside I cannot say that it is the > case as Cuis consumes and writes UFT8 text files. Unicode text > snippets pasted through the clipboard into a Cuis TextEditor also pass > in well. The only limitation is that internally it only handles the > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. > And if I work in a Cuis workspace with > > nn asCharacter > > where nn is an Integer > > nn must belong to ISO_8859-15 > > > ISO_8859-15 is good for most European languages. If we would have an > Add-On to cater for occasional other characters of Unicode which do > not fall into the set covered by ISO_8859-15 that would make UTF8 text > file processing with Cuis safe. > > > --Hannes > > >> >> Germàn. >> >> 2013/1/22 H. Hirzel <[hidden email]>: >>> Hello Germán and Juan >>> >>> As we have seen we can say that Cuis handles Unicode to a certain >>> limited extent. >>> >>> I will post summary a writeup of what I know about it later. I am >>> interested in working/contributing to an add-on which loads Unicode >>> support into Cuis. >>> >>> For general work I need >>> >>> a) >>> an add-on so that Cuis can process arbitrary UFT8 text files. However >>> the majority of the content characters will fall into the >>> https://de.wikipedia.org/wiki/ISO_8859-15 >>> range. So it is fine if the other characters are rendered as \unnn or >>> &#nnn; >>> >>> b) >>> Another more rewarding put maybe more difficult way would be to >>> replace the String class with a class which handles 16bit characters >>> instead of 8 bit characters. In terms of structure all would remain >>> the same. Characters would be 16bit like in Java. >>> >>> >>> This will come later. At the moment I am working on ContentPack >>> version 2 which will run on Cuis, Squeak and Pharo. >>> >>> Kind regards >>> >>> --Hannes >>> >>>> 2013/1/22 Germán Arduino <[hidden email]>: >>>>> Thanks for the comments Hannes / Juan: >>>>> >>>>> I will look into it when have time, or if you prefer Hannes and want >>>>> to help I will integrate it when finish with Aida. >>>>> >>>>> Germán. >>>>> >>>>> >>>>> >>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>>>> Hi Germán, >>>>>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for >>>>>> the >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>>>> alphabets). >>>>>> >>>>>> Cheers, >>>>>> Juan Vuletich >>>>>> >>>>>> Germán Arduino wrote: >>>>>>> >>>>>>> Hi: >>>>>>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all >>>>>>> tests green are ready to install. >>>>>>> >>>>>>> The changes I did in Swazoo are: >>>>>>> >>>>>>> >>>>>>> - Avoid Unicode support that don't exist in Cuis >>>>>>> >>> ...... >>> >>> _______________________________________________ >>> Cuis mailing list >>> [hidden email] >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> >> >> >> -- >> Sincerely, >> Germán Arduino >> about.me/garduino >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org 1573-CuisCore-HannesHirzel-2013Jan22-15h09m-hjh.1.cs.st (1K) Download Attachment |
Thanks Hannes, just integrated this.
Cheers, Juan Vuletich H. Hirzel wrote: > The attached change set prevents Cuis from silently ignoring > characters which are not in ISO 8859-15. > > For example if you paste a text snippet which contains the letter > Omega (Ω) into a TextWindow it is displayed as Ω > > The part which does it the other way round is not included. > > --Hannes > > > > On 1/22/13, H. Hirzel <[hidden email]> wrote: > >> Hello Germán >> >> On 1/22/13, Germán Arduino <[hidden email]> wrote: >> >>> Nice if you will develop the needed code! >>> >>> The first need I have is on the methods of Swazoo that I commented in >>> other mail, but I think that is more simple, only that I don't was >>> aware of the already inplace support in Cuis itself. >>> >> Yes, that took me as well some time to find out that Cuis indeed has >> some limited Unicode support. >> >> Juan originally wrote that Cuis had dropped Unicode support. >> >> When I have a look at Cuis from outside I cannot say that it is the >> case as Cuis consumes and writes UFT8 text files. Unicode text >> snippets pasted through the clipboard into a Cuis TextEditor also pass >> in well. The only limitation is that internally it only handles the >> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. >> And if I work in a Cuis workspace with >> >> nn asCharacter >> >> where nn is an Integer >> >> nn must belong to ISO_8859-15 >> >> >> ISO_8859-15 is good for most European languages. If we would have an >> Add-On to cater for occasional other characters of Unicode which do >> not fall into the set covered by ISO_8859-15 that would make UTF8 text >> file processing with Cuis safe. >> >> >> --Hannes >> >> >> >>> Germàn. >>> >>> 2013/1/22 H. Hirzel <[hidden email]>: >>> >>>> Hello Germán and Juan >>>> >>>> As we have seen we can say that Cuis handles Unicode to a certain >>>> limited extent. >>>> >>>> I will post summary a writeup of what I know about it later. I am >>>> interested in working/contributing to an add-on which loads Unicode >>>> support into Cuis. >>>> >>>> For general work I need >>>> >>>> a) >>>> an add-on so that Cuis can process arbitrary UFT8 text files. However >>>> the majority of the content characters will fall into the >>>> https://de.wikipedia.org/wiki/ISO_8859-15 >>>> range. So it is fine if the other characters are rendered as \unnn or >>>> &#nnn; >>>> >>>> b) >>>> Another more rewarding put maybe more difficult way would be to >>>> replace the String class with a class which handles 16bit characters >>>> instead of 8 bit characters. In terms of structure all would remain >>>> the same. Characters would be 16bit like in Java. >>>> >>>> >>>> This will come later. At the moment I am working on ContentPack >>>> version 2 which will run on Cuis, Squeak and Pharo. >>>> >>>> Kind regards >>>> >>>> --Hannes >>>> >>>> >>>>> 2013/1/22 Germán Arduino <[hidden email]>: >>>>> >>>>>> Thanks for the comments Hannes / Juan: >>>>>> >>>>>> I will look into it when have time, or if you prefer Hannes and want >>>>>> to help I will integrate it when finish with Aida. >>>>>> >>>>>> Germán. >>>>>> >>>>>> >>>>>> >>>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>>>> >>>>>>> Hi Germán, >>>>>>> >>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for >>>>>>> the >>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>>>>> alphabets). >>>>>>> >>>>>>> Cheers, >>>>>>> Juan Vuletich >>>>>>> >>>>>>> Germán Arduino wrote: >>>>>>> >>>>>>>> Hi: >>>>>>>> >>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all >>>>>>>> tests green are ready to install. >>>>>>>> >>>>>>>> The changes I did in Swazoo are: >>>>>>>> >>>>>>>> >>>>>>>> - Avoid Unicode support that don't exist in Cuis >>>>>>>> >>>>>>>> >>>> ...... >>>> >>>> _______________________________________________ >>>> Cuis mailing list >>>> [hidden email] >>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>> >>> >>> -- >>> Sincerely, >>> Germán Arduino >>> about.me/garduino >>> >>> _______________________________________________ >>> Cuis mailing list >>> [hidden email] >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>> >>> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Thank you Juan,
for adding the Unicode fix so that pasting text through the clipboard does not silently loose characters. More things like this (including comments) later. I have realized that what I wrote earlier is wrong. Cuis reads and saves files in ISO8859-15 by default and not with Unicode. However it is not too difficult to read and write a Unicode file. I have started some notes on this here https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md Regards Hannes On 1/23/13, Juan Vuletich <[hidden email]> wrote: > Thanks Hannes, just integrated this. > > Cheers, > Juan Vuletich > > H. Hirzel wrote: >> The attached change set prevents Cuis from silently ignoring >> characters which are not in ISO 8859-15. >> >> For example if you paste a text snippet which contains the letter >> Omega (Ω) into a TextWindow it is displayed as Ω >> >> The part which does it the other way round is not included. >> >> --Hannes >> >> >> >> On 1/22/13, H. Hirzel <[hidden email]> wrote: >> >>> Hello Germán >>> >>> On 1/22/13, Germán Arduino <[hidden email]> wrote: >>> >>>> Nice if you will develop the needed code! >>>> >>>> The first need I have is on the methods of Swazoo that I commented in >>>> other mail, but I think that is more simple, only that I don't was >>>> aware of the already inplace support in Cuis itself. >>>> >>> Yes, that took me as well some time to find out that Cuis indeed has >>> some limited Unicode support. >>> >>> Juan originally wrote that Cuis had dropped Unicode support. >>> >>> When I have a look at Cuis from outside I cannot say that it is the >>> case as Cuis consumes and writes UFT8 text files. Unicode text >>> snippets pasted through the clipboard into a Cuis TextEditor also pass >>> in well. The only limitation is that internally it only handles the >>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. >>> And if I work in a Cuis workspace with >>> >>> nn asCharacter >>> >>> where nn is an Integer >>> >>> nn must belong to ISO_8859-15 >>> >>> >>> ISO_8859-15 is good for most European languages. If we would have an >>> Add-On to cater for occasional other characters of Unicode which do >>> not fall into the set covered by ISO_8859-15 that would make UTF8 text >>> file processing with Cuis safe. >>> >>> >>> --Hannes >>> >>> >>> >>>> Germàn. >>>> >>>> 2013/1/22 H. Hirzel <[hidden email]>: >>>> >>>>> Hello Germán and Juan >>>>> >>>>> As we have seen we can say that Cuis handles Unicode to a certain >>>>> limited extent. >>>>> >>>>> I will post summary a writeup of what I know about it later. I am >>>>> interested in working/contributing to an add-on which loads Unicode >>>>> support into Cuis. >>>>> >>>>> For general work I need >>>>> >>>>> a) >>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However >>>>> the majority of the content characters will fall into the >>>>> https://de.wikipedia.org/wiki/ISO_8859-15 >>>>> range. So it is fine if the other characters are rendered as \unnn or >>>>> &#nnn; >>>>> >>>>> b) >>>>> Another more rewarding put maybe more difficult way would be to >>>>> replace the String class with a class which handles 16bit characters >>>>> instead of 8 bit characters. In terms of structure all would remain >>>>> the same. Characters would be 16bit like in Java. >>>>> >>>>> >>>>> This will come later. At the moment I am working on ContentPack >>>>> version 2 which will run on Cuis, Squeak and Pharo. >>>>> >>>>> Kind regards >>>>> >>>>> --Hannes >>>>> >>>>> >>>>>> 2013/1/22 Germán Arduino <[hidden email]>: >>>>>> >>>>>>> Thanks for the comments Hannes / Juan: >>>>>>> >>>>>>> I will look into it when have time, or if you prefer Hannes and want >>>>>>> to help I will integrate it when finish with Aida. >>>>>>> >>>>>>> Germán. >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>>>>> >>>>>>>> Hi Germán, >>>>>>>> >>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for >>>>>>>> the >>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>>>>>> alphabets). >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Juan Vuletich >>>>>>>> >>>>>>>> Germán Arduino wrote: >>>>>>>> >>>>>>>>> Hi: >>>>>>>>> >>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all >>>>>>>>> tests green are ready to install. >>>>>>>>> >>>>>>>>> The changes I did in Swazoo are: >>>>>>>>> >>>>>>>>> >>>>>>>>> - Avoid Unicode support that don't exist in Cuis >>>>>>>>> >>>>>>>>> >>>>> ...... >>>>> >>>>> _______________________________________________ >>>>> Cuis mailing list >>>>> [hidden email] >>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>> >>>> >>>> -- >>>> Sincerely, >>>> Germán Arduino >>>> about.me/garduino >>>> >>>> _______________________________________________ >>>> Cuis mailing list >>>> [hidden email] >>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>> >>>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Cuis mailing list >>> [hidden email] >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>> > > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Thanks Hannes, this is very useful to me.
My next step in porting stuff is polish WebClient and, between other things, Unicode is an issue. Germán. 2013/2/1 H. Hirzel <[hidden email]>: > Thank you Juan, > for adding the Unicode fix so that pasting text through the clipboard > does not silently loose characters. More things like this (including > comments) later. > > I have realized that what I wrote earlier is wrong. Cuis reads and > saves files in ISO8859-15 by default and not with Unicode. However it > is not too difficult to read and write a Unicode file. > > I have started some notes on this here > https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md > > Regards > Hannes > > On 1/23/13, Juan Vuletich <[hidden email]> wrote: >> Thanks Hannes, just integrated this. >> >> Cheers, >> Juan Vuletich >> >> H. Hirzel wrote: >>> The attached change set prevents Cuis from silently ignoring >>> characters which are not in ISO 8859-15. >>> >>> For example if you paste a text snippet which contains the letter >>> Omega (Ω) into a TextWindow it is displayed as Ω >>> >>> The part which does it the other way round is not included. >>> >>> --Hannes >>> >>> >>> >>> On 1/22/13, H. Hirzel <[hidden email]> wrote: >>> >>>> Hello Germán >>>> >>>> On 1/22/13, Germán Arduino <[hidden email]> wrote: >>>> >>>>> Nice if you will develop the needed code! >>>>> >>>>> The first need I have is on the methods of Swazoo that I commented in >>>>> other mail, but I think that is more simple, only that I don't was >>>>> aware of the already inplace support in Cuis itself. >>>>> >>>> Yes, that took me as well some time to find out that Cuis indeed has >>>> some limited Unicode support. >>>> >>>> Juan originally wrote that Cuis had dropped Unicode support. >>>> >>>> When I have a look at Cuis from outside I cannot say that it is the >>>> case as Cuis consumes and writes UFT8 text files. Unicode text >>>> snippets pasted through the clipboard into a Cuis TextEditor also pass >>>> in well. The only limitation is that internally it only handles the >>>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. >>>> And if I work in a Cuis workspace with >>>> >>>> nn asCharacter >>>> >>>> where nn is an Integer >>>> >>>> nn must belong to ISO_8859-15 >>>> >>>> >>>> ISO_8859-15 is good for most European languages. If we would have an >>>> Add-On to cater for occasional other characters of Unicode which do >>>> not fall into the set covered by ISO_8859-15 that would make UTF8 text >>>> file processing with Cuis safe. >>>> >>>> >>>> --Hannes >>>> >>>> >>>> >>>>> Germàn. >>>>> >>>>> 2013/1/22 H. Hirzel <[hidden email]>: >>>>> >>>>>> Hello Germán and Juan >>>>>> >>>>>> As we have seen we can say that Cuis handles Unicode to a certain >>>>>> limited extent. >>>>>> >>>>>> I will post summary a writeup of what I know about it later. I am >>>>>> interested in working/contributing to an add-on which loads Unicode >>>>>> support into Cuis. >>>>>> >>>>>> For general work I need >>>>>> >>>>>> a) >>>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However >>>>>> the majority of the content characters will fall into the >>>>>> https://de.wikipedia.org/wiki/ISO_8859-15 >>>>>> range. So it is fine if the other characters are rendered as \unnn or >>>>>> &#nnn; >>>>>> >>>>>> b) >>>>>> Another more rewarding put maybe more difficult way would be to >>>>>> replace the String class with a class which handles 16bit characters >>>>>> instead of 8 bit characters. In terms of structure all would remain >>>>>> the same. Characters would be 16bit like in Java. >>>>>> >>>>>> >>>>>> This will come later. At the moment I am working on ContentPack >>>>>> version 2 which will run on Cuis, Squeak and Pharo. >>>>>> >>>>>> Kind regards >>>>>> >>>>>> --Hannes >>>>>> >>>>>> >>>>>>> 2013/1/22 Germán Arduino <[hidden email]>: >>>>>>> >>>>>>>> Thanks for the comments Hannes / Juan: >>>>>>>> >>>>>>>> I will look into it when have time, or if you prefer Hannes and want >>>>>>>> to help I will integrate it when finish with Aida. >>>>>>>> >>>>>>>> Germán. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>>>>>> >>>>>>>>> Hi Germán, >>>>>>>>> >>>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 for >>>>>>>>> the >>>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>>>>>>> alphabets). >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Juan Vuletich >>>>>>>>> >>>>>>>>> Germán Arduino wrote: >>>>>>>>> >>>>>>>>>> Hi: >>>>>>>>>> >>>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with all >>>>>>>>>> tests green are ready to install. >>>>>>>>>> >>>>>>>>>> The changes I did in Swazoo are: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> - Avoid Unicode support that don't exist in Cuis >>>>>>>>>> >>>>>>>>>> >>>>>> ...... >>>>>> >>>>>> _______________________________________________ >>>>>> Cuis mailing list >>>>>> [hidden email] >>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>>> >>>>> >>>>> -- >>>>> Sincerely, >>>>> Germán Arduino >>>>> about.me/garduino >>>>> >>>>> _______________________________________________ >>>>> Cuis mailing list >>>>> [hidden email] >>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>> >>>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Cuis mailing list >>>> [hidden email] >>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>> >> >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Your feedback, Germán,
makes me consider to analyze what it would involve to write a simple Unicode porting level. I think of an Add-On to Cuis which people can load if they want to work more Unicode compliant. In fact 50% of all HTML files on the internet are encoded in UTF8 and the text files with which I work are mostly UFT8. So if I use WebClient to download an HTML file and want to futher process it I have to deal with workarounds. Even HTML files in major European languages often have Unicode characters like special hypens, quotation marks, graphical symbols etc. One idea I'd like to try out is just to replace the class String which only stores bytes (8bit) with a String class which stores words (32 bit). It is a bit a waste in terms of space but conceptually it would be straightforward. Space measurement has shown that there are not all that many strings in Cuis. The major part is taken by bitmaps. I just have to figure out how to work with these variableByteSubclasses with which I have not done much in the past. --Hannes On 2/1/13, Germán Arduino <[hidden email]> wrote: > Thanks Hannes, this is very useful to me. > > My next step in porting stuff is polish WebClient and, between other > things, Unicode is an issue. > > Germán. > > 2013/2/1 H. Hirzel <[hidden email]>: >> Thank you Juan, >> for adding the Unicode fix so that pasting text through the clipboard >> does not silently loose characters. More things like this (including >> comments) later. >> >> I have realized that what I wrote earlier is wrong. Cuis reads and >> saves files in ISO8859-15 by default and not with Unicode. However it >> is not too difficult to read and write a Unicode file. >> >> I have started some notes on this here >> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md >> >> Regards >> Hannes >> >> On 1/23/13, Juan Vuletich <[hidden email]> wrote: >>> Thanks Hannes, just integrated this. >>> >>> Cheers, >>> Juan Vuletich >>> >>> H. Hirzel wrote: >>>> The attached change set prevents Cuis from silently ignoring >>>> characters which are not in ISO 8859-15. >>>> >>>> For example if you paste a text snippet which contains the letter >>>> Omega (Ω) into a TextWindow it is displayed as Ω >>>> >>>> The part which does it the other way round is not included. >>>> >>>> --Hannes >>>> >>>> >>>> >>>> On 1/22/13, H. Hirzel <[hidden email]> wrote: >>>> >>>>> Hello Germán >>>>> >>>>> On 1/22/13, Germán Arduino <[hidden email]> wrote: >>>>> >>>>>> Nice if you will develop the needed code! >>>>>> >>>>>> The first need I have is on the methods of Swazoo that I commented in >>>>>> other mail, but I think that is more simple, only that I don't was >>>>>> aware of the already inplace support in Cuis itself. >>>>>> >>>>> Yes, that took me as well some time to find out that Cuis indeed has >>>>> some limited Unicode support. >>>>> >>>>> Juan originally wrote that Cuis had dropped Unicode support. >>>>> >>>>> When I have a look at Cuis from outside I cannot say that it is the >>>>> case as Cuis consumes and writes UFT8 text files. Unicode text >>>>> snippets pasted through the clipboard into a Cuis TextEditor also pass >>>>> in well. The only limitation is that internally it only handles the >>>>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. >>>>> And if I work in a Cuis workspace with >>>>> >>>>> nn asCharacter >>>>> >>>>> where nn is an Integer >>>>> >>>>> nn must belong to ISO_8859-15 >>>>> >>>>> >>>>> ISO_8859-15 is good for most European languages. If we would have an >>>>> Add-On to cater for occasional other characters of Unicode which do >>>>> not fall into the set covered by ISO_8859-15 that would make UTF8 text >>>>> file processing with Cuis safe. >>>>> >>>>> >>>>> --Hannes >>>>> >>>>> >>>>> >>>>>> Germàn. >>>>>> >>>>>> 2013/1/22 H. Hirzel <[hidden email]>: >>>>>> >>>>>>> Hello Germán and Juan >>>>>>> >>>>>>> As we have seen we can say that Cuis handles Unicode to a certain >>>>>>> limited extent. >>>>>>> >>>>>>> I will post summary a writeup of what I know about it later. I am >>>>>>> interested in working/contributing to an add-on which loads Unicode >>>>>>> support into Cuis. >>>>>>> >>>>>>> For general work I need >>>>>>> >>>>>>> a) >>>>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However >>>>>>> the majority of the content characters will fall into the >>>>>>> https://de.wikipedia.org/wiki/ISO_8859-15 >>>>>>> range. So it is fine if the other characters are rendered as \unnn or >>>>>>> &#nnn; >>>>>>> >>>>>>> b) >>>>>>> Another more rewarding put maybe more difficult way would be to >>>>>>> replace the String class with a class which handles 16bit characters >>>>>>> instead of 8 bit characters. In terms of structure all would remain >>>>>>> the same. Characters would be 16bit like in Java. >>>>>>> >>>>>>> >>>>>>> This will come later. At the moment I am working on ContentPack >>>>>>> version 2 which will run on Cuis, Squeak and Pharo. >>>>>>> >>>>>>> Kind regards >>>>>>> >>>>>>> --Hannes >>>>>>> >>>>>>> >>>>>>>> 2013/1/22 Germán Arduino <[hidden email]>: >>>>>>>> >>>>>>>>> Thanks for the comments Hannes / Juan: >>>>>>>>> >>>>>>>>> I will look into it when have time, or if you prefer Hannes and >>>>>>>>> want >>>>>>>>> to help I will integrate it when finish with Aida. >>>>>>>>> >>>>>>>>> Germán. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>>>>>>> >>>>>>>>>> Hi Germán, >>>>>>>>>> >>>>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 >>>>>>>>>> for >>>>>>>>>> the >>>>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>>>>>>>> alphabets). >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Juan Vuletich >>>>>>>>>> >>>>>>>>>> Germán Arduino wrote: >>>>>>>>>> >>>>>>>>>>> Hi: >>>>>>>>>>> >>>>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with >>>>>>>>>>> all >>>>>>>>>>> tests green are ready to install. >>>>>>>>>>> >>>>>>>>>>> The changes I did in Swazoo are: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Avoid Unicode support that don't exist in Cuis >>>>>>>>>>> >>>>>>>>>>> >>>>>>> ...... >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Cuis mailing list >>>>>>> [hidden email] >>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>>>> >>>>>> >>>>>> -- >>>>>> Sincerely, >>>>>> Germán Arduino >>>>>> about.me/garduino >>>>>> >>>>>> _______________________________________________ >>>>>> Cuis mailing list >>>>>> [hidden email] >>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>>> >>>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Cuis mailing list >>>>> [hidden email] >>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>> >>> >>> >>> _______________________________________________ >>> Cuis mailing list >>> [hidden email] >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>> >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Yes, I agree in that have an installable layer should be the best option.
But Unicode and related stuff are not my expertise field and I do not have too much value to add :( 2013/2/2 H. Hirzel <[hidden email]>: > Your feedback, Germán, > > makes me consider to analyze what it would involve to write a simple > Unicode porting level. > > I think of an Add-On to Cuis which people can load if they want to > work more Unicode compliant. > > In fact 50% of all HTML files on the internet are encoded in UTF8 and > the text files with which I work are mostly UFT8. So if I use > WebClient to download an HTML file and want to futher process it I > have to deal with workarounds. Even HTML files in major European > languages often have Unicode characters like special hypens, quotation > marks, graphical symbols etc. > > One idea I'd like to try out is just to replace the class String which > only stores bytes (8bit) with a String class which stores words (32 > bit). It is a bit a waste in terms of space but conceptually it would > be straightforward. Space measurement has shown that there are not all > that many strings in Cuis. The major part is taken by bitmaps. > > I just have to figure out how to work with these > variableByteSubclasses with which I have not done much in the past. > > --Hannes > > > > > > On 2/1/13, Germán Arduino <[hidden email]> wrote: >> Thanks Hannes, this is very useful to me. >> >> My next step in porting stuff is polish WebClient and, between other >> things, Unicode is an issue. >> >> Germán. >> >> 2013/2/1 H. Hirzel <[hidden email]>: >>> Thank you Juan, >>> for adding the Unicode fix so that pasting text through the clipboard >>> does not silently loose characters. More things like this (including >>> comments) later. >>> >>> I have realized that what I wrote earlier is wrong. Cuis reads and >>> saves files in ISO8859-15 by default and not with Unicode. However it >>> is not too difficult to read and write a Unicode file. >>> >>> I have started some notes on this here >>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md >>> >>> Regards >>> Hannes >>> >>> On 1/23/13, Juan Vuletich <[hidden email]> wrote: >>>> Thanks Hannes, just integrated this. >>>> >>>> Cheers, >>>> Juan Vuletich >>>> >>>> H. Hirzel wrote: >>>>> The attached change set prevents Cuis from silently ignoring >>>>> characters which are not in ISO 8859-15. >>>>> >>>>> For example if you paste a text snippet which contains the letter >>>>> Omega (Ω) into a TextWindow it is displayed as Ω >>>>> >>>>> The part which does it the other way round is not included. >>>>> >>>>> --Hannes >>>>> >>>>> >>>>> >>>>> On 1/22/13, H. Hirzel <[hidden email]> wrote: >>>>> >>>>>> Hello Germán >>>>>> >>>>>> On 1/22/13, Germán Arduino <[hidden email]> wrote: >>>>>> >>>>>>> Nice if you will develop the needed code! >>>>>>> >>>>>>> The first need I have is on the methods of Swazoo that I commented in >>>>>>> other mail, but I think that is more simple, only that I don't was >>>>>>> aware of the already inplace support in Cuis itself. >>>>>>> >>>>>> Yes, that took me as well some time to find out that Cuis indeed has >>>>>> some limited Unicode support. >>>>>> >>>>>> Juan originally wrote that Cuis had dropped Unicode support. >>>>>> >>>>>> When I have a look at Cuis from outside I cannot say that it is the >>>>>> case as Cuis consumes and writes UFT8 text files. Unicode text >>>>>> snippets pasted through the clipboard into a Cuis TextEditor also pass >>>>>> in well. The only limitation is that internally it only handles the >>>>>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. >>>>>> And if I work in a Cuis workspace with >>>>>> >>>>>> nn asCharacter >>>>>> >>>>>> where nn is an Integer >>>>>> >>>>>> nn must belong to ISO_8859-15 >>>>>> >>>>>> >>>>>> ISO_8859-15 is good for most European languages. If we would have an >>>>>> Add-On to cater for occasional other characters of Unicode which do >>>>>> not fall into the set covered by ISO_8859-15 that would make UTF8 text >>>>>> file processing with Cuis safe. >>>>>> >>>>>> >>>>>> --Hannes >>>>>> >>>>>> >>>>>> >>>>>>> Germàn. >>>>>>> >>>>>>> 2013/1/22 H. Hirzel <[hidden email]>: >>>>>>> >>>>>>>> Hello Germán and Juan >>>>>>>> >>>>>>>> As we have seen we can say that Cuis handles Unicode to a certain >>>>>>>> limited extent. >>>>>>>> >>>>>>>> I will post summary a writeup of what I know about it later. I am >>>>>>>> interested in working/contributing to an add-on which loads Unicode >>>>>>>> support into Cuis. >>>>>>>> >>>>>>>> For general work I need >>>>>>>> >>>>>>>> a) >>>>>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However >>>>>>>> the majority of the content characters will fall into the >>>>>>>> https://de.wikipedia.org/wiki/ISO_8859-15 >>>>>>>> range. So it is fine if the other characters are rendered as \unnn or >>>>>>>> &#nnn; >>>>>>>> >>>>>>>> b) >>>>>>>> Another more rewarding put maybe more difficult way would be to >>>>>>>> replace the String class with a class which handles 16bit characters >>>>>>>> instead of 8 bit characters. In terms of structure all would remain >>>>>>>> the same. Characters would be 16bit like in Java. >>>>>>>> >>>>>>>> >>>>>>>> This will come later. At the moment I am working on ContentPack >>>>>>>> version 2 which will run on Cuis, Squeak and Pharo. >>>>>>>> >>>>>>>> Kind regards >>>>>>>> >>>>>>>> --Hannes >>>>>>>> >>>>>>>> >>>>>>>>> 2013/1/22 Germán Arduino <[hidden email]>: >>>>>>>>> >>>>>>>>>> Thanks for the comments Hannes / Juan: >>>>>>>>>> >>>>>>>>>> I will look into it when have time, or if you prefer Hannes and >>>>>>>>>> want >>>>>>>>>> to help I will integrate it when finish with Aida. >>>>>>>>>> >>>>>>>>>> Germán. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>>>>>>>> >>>>>>>>>>> Hi Germán, >>>>>>>>>>> >>>>>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 >>>>>>>>>>> for >>>>>>>>>>> the >>>>>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>>>>>>>>> alphabets). >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Juan Vuletich >>>>>>>>>>> >>>>>>>>>>> Germán Arduino wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi: >>>>>>>>>>>> >>>>>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with >>>>>>>>>>>> all >>>>>>>>>>>> tests green are ready to install. >>>>>>>>>>>> >>>>>>>>>>>> The changes I did in Swazoo are: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Avoid Unicode support that don't exist in Cuis >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> ...... >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Cuis mailing list >>>>>>>> [hidden email] >>>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sincerely, >>>>>>> Germán Arduino >>>>>>> about.me/garduino >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Cuis mailing list >>>>>>> [hidden email] >>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>>>> >>>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Cuis mailing list >>>>>> [hidden email] >>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>>>> >>>> >>>> >>>> _______________________________________________ >>>> Cuis mailing list >>>> [hidden email] >>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>> >>> >>> _______________________________________________ >>> Cuis mailing list >>> [hidden email] >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
In reply to this post by Hannes Hirzel
This is cool. Good start. Someday I want to be able to have a class called 無 :D
On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <[hidden email]> wrote: The attached change set prevents Cuis from silently ignoring Casey Ransberger _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Hello all
In the meantime I am investigating how to construct a small library which works with WideCharacters and WideStrings and the FileStream and UTF8Converter which deals with it. As a start I filed out String and Character and changed the names and class references in it to WideString and WideCharacter. I now can create Unicode strings in Cuis. Probably I'll simplify both WideCharacter and WideString in order to be able to focus more on the problem as such and learn how to implement it in a simple and straightforward way. The Unicode-Add-On library then may serve as a prerequisite for loading WebClient. Germán Arduino and I have to figure out what actually is needed. Helpful to understand how WideCharacters work was to have a look at the class ColorArray. It only have 4 methods. The subclass definition is special ArrayedCollection variableWordSubclass: #ColorArray instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'Collections-Arrayed' Using #variableWordSubclass: instead of the regular #subClass: means that the an array of 32bit integers is made available to work with. A Color is similar to a Unicode character in the sense that an instance of the class Color can be completely described with an 32 bit integer. So internally the class ColorArray does not actually store instances of Color though it is made to appear so as seen from outside. When I want to access a color in aColorArray I do aColorArray at: index and the aColorArray actually internally accesses a 32bit integer (= a word) and converts it to aColor by asking class Integer to do it Integer>> asColorOfDepth: d "Return a color value representing the receiver as color of the given depth" ^Color colorFromPixelValue: self depth: d Juan once wrote out that he left out Unicode because he thought it is 'too complicated'. Looking at the implementation in Squeak I think things could be done differently. It depends on what is actually needed. Reviewing the code is surely a good thing. At the moment I'd like to go for a relatively thin layer to make web application porting straightforward. Regards Hannes On 2/4/13, Casey Ransberger <[hidden email]> wrote: > This is cool. Good start. Someday I want to be able to have a class called > 無 > :D > > On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <[hidden email]> wrote: > >> The attached change set prevents Cuis from silently ignoring >> characters which are not in ISO 8859-15. >> >> For example if you paste a text snippet which contains the letter >> Omega (Ω) into a TextWindow it is displayed as Ω >> >> The part which does it the other way round is not included. >> >> --Hannes >> >> >> >> On 1/22/13, H. Hirzel <[hidden email]> wrote: >> > Hello Germán >> > >> > On 1/22/13, Germán Arduino <[hidden email]> wrote: >> >> Nice if you will develop the needed code! >> >> >> >> The first need I have is on the methods of Swazoo that I commented in >> >> other mail, but I think that is more simple, only that I don't was >> >> aware of the already inplace support in Cuis itself. >> > >> > Yes, that took me as well some time to find out that Cuis indeed has >> > some limited Unicode support. >> > >> > Juan originally wrote that Cuis had dropped Unicode support. >> > >> > When I have a look at Cuis from outside I cannot say that it is the >> > case as Cuis consumes and writes UFT8 text files. Unicode text >> > snippets pasted through the clipboard into a Cuis TextEditor also pass >> > in well. The only limitation is that internally it only handles the >> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. >> > And if I work in a Cuis workspace with >> > >> > nn asCharacter >> > >> > where nn is an Integer >> > >> > nn must belong to ISO_8859-15 >> > >> > >> > ISO_8859-15 is good for most European languages. If we would have an >> > Add-On to cater for occasional other characters of Unicode which do >> > not fall into the set covered by ISO_8859-15 that would make UTF8 text >> > file processing with Cuis safe. >> > >> > >> > --Hannes >> > >> > >> >> >> >> Germàn. >> >> >> >> 2013/1/22 H. Hirzel <[hidden email]>: >> >>> Hello Germán and Juan >> >>> >> >>> As we have seen we can say that Cuis handles Unicode to a certain >> >>> limited extent. >> >>> >> >>> I will post summary a writeup of what I know about it later. I am >> >>> interested in working/contributing to an add-on which loads Unicode >> >>> support into Cuis. >> >>> >> >>> For general work I need >> >>> >> >>> a) >> >>> an add-on so that Cuis can process arbitrary UFT8 text files. However >> >>> the majority of the content characters will fall into the >> >>> https://de.wikipedia.org/wiki/ISO_8859-15 >> >>> range. So it is fine if the other characters are rendered as \unnn or >> >>> &#nnn; >> >>> >> >>> b) >> >>> Another more rewarding put maybe more difficult way would be to >> >>> replace the String class with a class which handles 16bit characters >> >>> instead of 8 bit characters. In terms of structure all would remain >> >>> the same. Characters would be 16bit like in Java. >> >>> >> >>> >> >>> This will come later. At the moment I am working on ContentPack >> >>> version 2 which will run on Cuis, Squeak and Pharo. >> >>> >> >>> Kind regards >> >>> >> >>> --Hannes >> >>> >> >>>> 2013/1/22 Germán Arduino <[hidden email]>: >> >>>>> Thanks for the comments Hannes / Juan: >> >>>>> >> >>>>> I will look into it when have time, or if you prefer Hannes and >> >>>>> want >> >>>>> to help I will integrate it when finish with Aida. >> >>>>> >> >>>>> Germán. >> >>>>> >> >>>>> >> >>>>> >> >>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >> >>>>>> Hi Germán, >> >>>>>> >> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 >> >>>>>> for >> >>>>>> the >> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >> >>>>>> alphabets). >> >>>>>> >> >>>>>> Cheers, >> >>>>>> Juan Vuletich >> >>>>>> >> >>>>>> Germán Arduino wrote: >> >>>>>>> >> >>>>>>> Hi: >> >>>>>>> >> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with >> >>>>>>> all >> >>>>>>> tests green are ready to install. >> >>>>>>> >> >>>>>>> The changes I did in Swazoo are: >> >>>>>>> >> >>>>>>> >> >>>>>>> - Avoid Unicode support that don't exist in Cuis >> >>>>>>> >> >>> ...... >> >>> >> >>> _______________________________________________ >> >>> Cuis mailing list >> >>> [hidden email] >> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> >> >> >> >> >> >> >> -- >> >> Sincerely, >> >> Germán Arduino >> >> about.me/garduino >> >> >> >> _______________________________________________ >> >> Cuis mailing list >> >> [hidden email] >> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> >> >> > >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> >> > > > -- > Casey Ransberger > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
P.S.
the necessity for a Unicode solution becomes visible for example with the README.md file https://github.com/hhzl/Cuis-WebClient/blob/master/README.md " Germán Arduino" appears proplerly there whereas in Cuis it is displayed as the attached screen shot shows. On 2/5/13, H. Hirzel <[hidden email]> wrote: > Hello all > > In the meantime I am investigating how to construct a small library > which works with WideCharacters and WideStrings and the FileStream and > UTF8Converter which deals with it. > > As a start I filed out String and Character and changed the names and > class references in it to WideString and WideCharacter. I now can > create Unicode strings in Cuis. Probably I'll simplify both > WideCharacter and WideString in order to be able to focus more on the > problem as such and learn how to implement it in a simple and > straightforward way. The Unicode-Add-On library then may serve as a > prerequisite for loading WebClient. Germán Arduino and I have to > figure out what actually is needed. > > Helpful to understand how WideCharacters work was to have a look at > the class ColorArray. > It only have 4 methods. > > The subclass definition is special > > ArrayedCollection variableWordSubclass: #ColorArray > instanceVariableNames: '' > classVariableNames: '' > poolDictionaries: '' > category: 'Collections-Arrayed' > > Using > #variableWordSubclass: > instead of the regular > #subClass: > > means that the an array of 32bit integers is made available to work with. > > A Color is similar to a Unicode character in the sense that an > instance of the class Color can be completely described with an 32 bit > integer. So internally the class ColorArray does not actually store > instances of Color though it is made to appear so as seen from > outside. > > When I want to access a color in aColorArray I do > aColorArray at: index > > and the aColorArray actually internally accesses a 32bit integer (= a > word) and converts it to aColor by asking class Integer to do it > > Integer>> > asColorOfDepth: d > "Return a color value representing the receiver as color of the given > depth" > ^Color colorFromPixelValue: self depth: d > > Juan once wrote out that he left out Unicode because he thought it is > 'too complicated'. Looking at the implementation in Squeak I think > things could be done differently. It depends on what is actually > needed. Reviewing the code is surely a good thing. At the moment I'd > like to go for a relatively thin layer to make web application porting > straightforward. > > > Regards > Hannes > > > On 2/4/13, Casey Ransberger <[hidden email]> wrote: >> This is cool. Good start. Someday I want to be able to have a class >> called >> 無 >> :D >> >> On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <[hidden email]> >> wrote: >> >>> The attached change set prevents Cuis from silently ignoring >>> characters which are not in ISO 8859-15. >>> >>> For example if you paste a text snippet which contains the letter >>> Omega (Ω) into a TextWindow it is displayed as Ω >>> >>> The part which does it the other way round is not included. >>> >>> --Hannes >>> >>> >>> >>> On 1/22/13, H. Hirzel <[hidden email]> wrote: >>> > Hello Germán >>> > >>> > On 1/22/13, Germán Arduino <[hidden email]> wrote: >>> >> Nice if you will develop the needed code! >>> >> >>> >> The first need I have is on the methods of Swazoo that I commented in >>> >> other mail, but I think that is more simple, only that I don't was >>> >> aware of the already inplace support in Cuis itself. >>> > >>> > Yes, that took me as well some time to find out that Cuis indeed has >>> > some limited Unicode support. >>> > >>> > Juan originally wrote that Cuis had dropped Unicode support. >>> > >>> > When I have a look at Cuis from outside I cannot say that it is the >>> > case as Cuis consumes and writes UFT8 text files. Unicode text >>> > snippets pasted through the clipboard into a Cuis TextEditor also pass >>> > in well. The only limitation is that internally it only handles the >>> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. >>> > And if I work in a Cuis workspace with >>> > >>> > nn asCharacter >>> > >>> > where nn is an Integer >>> > >>> > nn must belong to ISO_8859-15 >>> > >>> > >>> > ISO_8859-15 is good for most European languages. If we would have an >>> > Add-On to cater for occasional other characters of Unicode which do >>> > not fall into the set covered by ISO_8859-15 that would make UTF8 text >>> > file processing with Cuis safe. >>> > >>> > >>> > --Hannes >>> > >>> > >>> >> >>> >> Germàn. >>> >> >>> >> 2013/1/22 H. Hirzel <[hidden email]>: >>> >>> Hello Germán and Juan >>> >>> >>> >>> As we have seen we can say that Cuis handles Unicode to a certain >>> >>> limited extent. >>> >>> >>> >>> I will post summary a writeup of what I know about it later. I am >>> >>> interested in working/contributing to an add-on which loads Unicode >>> >>> support into Cuis. >>> >>> >>> >>> For general work I need >>> >>> >>> >>> a) >>> >>> an add-on so that Cuis can process arbitrary UFT8 text files. >>> >>> However >>> >>> the majority of the content characters will fall into the >>> >>> https://de.wikipedia.org/wiki/ISO_8859-15 >>> >>> range. So it is fine if the other characters are rendered as \unnn >>> >>> or >>> >>> &#nnn; >>> >>> >>> >>> b) >>> >>> Another more rewarding put maybe more difficult way would be to >>> >>> replace the String class with a class which handles 16bit characters >>> >>> instead of 8 bit characters. In terms of structure all would remain >>> >>> the same. Characters would be 16bit like in Java. >>> >>> >>> >>> >>> >>> This will come later. At the moment I am working on ContentPack >>> >>> version 2 which will run on Cuis, Squeak and Pharo. >>> >>> >>> >>> Kind regards >>> >>> >>> >>> --Hannes >>> >>> >>> >>>> 2013/1/22 Germán Arduino <[hidden email]>: >>> >>>>> Thanks for the comments Hannes / Juan: >>> >>>>> >>> >>>>> I will look into it when have time, or if you prefer Hannes and >>> >>>>> want >>> >>>>> to help I will integrate it when finish with Aida. >>> >>>>> >>> >>>>> Germán. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>> >>>>>> Hi Germán, >>> >>>>>> >>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 >>> >>>>>> for >>> >>>>>> the >>> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>> >>>>>> alphabets). >>> >>>>>> >>> >>>>>> Cheers, >>> >>>>>> Juan Vuletich >>> >>>>>> >>> >>>>>> Germán Arduino wrote: >>> >>>>>>> >>> >>>>>>> Hi: >>> >>>>>>> >>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with >>> >>>>>>> all >>> >>>>>>> tests green are ready to install. >>> >>>>>>> >>> >>>>>>> The changes I did in Swazoo are: >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> - Avoid Unicode support that don't exist in Cuis >>> >>>>>>> >>> >>> ...... >>> >>> >>> >>> _______________________________________________ >>> >>> Cuis mailing list >>> >>> [hidden email] >>> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>> >> >>> >> >>> >> >>> >> -- >>> >> Sincerely, >>> >> Germán Arduino >>> >> about.me/garduino >>> >> >>> >> _______________________________________________ >>> >> Cuis mailing list >>> >> [hidden email] >>> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>> >> >>> > >>> >>> _______________________________________________ >>> Cuis mailing list >>> [hidden email] >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>> >>> >> >> >> -- >> Casey Ransberger >> > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org ScreenShotWithUTF8displayProblem.png (70K) Download Attachment |
I forgot to mention the explanation
The README.md as used by github file is encoded in UFT8 http://en.wikipedia.org/wiki/UTF8 wheras the Cuis File List browser assumes that text files are encoded in ISO8859-15. http://en.wikipedia.org/wiki/ISO/IEC_8859-15 This actually calls for a preference to tell Cuis how to interpret text files - UTF8 or - ISO8859-15 On 2/5/13, H. Hirzel <[hidden email]> wrote: > P.S. > the necessity for a Unicode solution becomes visible for example with > the README.md file > > https://github.com/hhzl/Cuis-WebClient/blob/master/README.md > > " Germán Arduino" appears proplerly there whereas in Cuis it is > displayed as the attached screen shot shows. > > > > On 2/5/13, H. Hirzel <[hidden email]> wrote: >> Hello all >> >> In the meantime I am investigating how to construct a small library >> which works with WideCharacters and WideStrings and the FileStream and >> UTF8Converter which deals with it. >> >> As a start I filed out String and Character and changed the names and >> class references in it to WideString and WideCharacter. I now can >> create Unicode strings in Cuis. Probably I'll simplify both >> WideCharacter and WideString in order to be able to focus more on the >> problem as such and learn how to implement it in a simple and >> straightforward way. The Unicode-Add-On library then may serve as a >> prerequisite for loading WebClient. Germán Arduino and I have to >> figure out what actually is needed. >> >> Helpful to understand how WideCharacters work was to have a look at >> the class ColorArray. >> It only have 4 methods. >> >> The subclass definition is special >> >> ArrayedCollection variableWordSubclass: #ColorArray >> instanceVariableNames: '' >> classVariableNames: '' >> poolDictionaries: '' >> category: 'Collections-Arrayed' >> >> Using >> #variableWordSubclass: >> instead of the regular >> #subClass: >> >> means that the an array of 32bit integers is made available to work with. >> >> A Color is similar to a Unicode character in the sense that an >> instance of the class Color can be completely described with an 32 bit >> integer. So internally the class ColorArray does not actually store >> instances of Color though it is made to appear so as seen from >> outside. >> >> When I want to access a color in aColorArray I do >> aColorArray at: index >> >> and the aColorArray actually internally accesses a 32bit integer (= a >> word) and converts it to aColor by asking class Integer to do it >> >> Integer>> >> asColorOfDepth: d >> "Return a color value representing the receiver as color of the given >> depth" >> ^Color colorFromPixelValue: self depth: d >> >> Juan once wrote out that he left out Unicode because he thought it is >> 'too complicated'. Looking at the implementation in Squeak I think >> things could be done differently. It depends on what is actually >> needed. Reviewing the code is surely a good thing. At the moment I'd >> like to go for a relatively thin layer to make web application porting >> straightforward. >> >> >> Regards >> Hannes >> >> >> On 2/4/13, Casey Ransberger <[hidden email]> wrote: >>> This is cool. Good start. Someday I want to be able to have a class >>> called >>> 無 >>> :D >>> >>> On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <[hidden email]> >>> wrote: >>> >>>> The attached change set prevents Cuis from silently ignoring >>>> characters which are not in ISO 8859-15. >>>> >>>> For example if you paste a text snippet which contains the letter >>>> Omega (Ω) into a TextWindow it is displayed as Ω >>>> >>>> The part which does it the other way round is not included. >>>> >>>> --Hannes >>>> >>>> >>>> >>>> On 1/22/13, H. Hirzel <[hidden email]> wrote: >>>> > Hello Germán >>>> > >>>> > On 1/22/13, Germán Arduino <[hidden email]> wrote: >>>> >> Nice if you will develop the needed code! >>>> >> >>>> >> The first need I have is on the methods of Swazoo that I commented >>>> >> in >>>> >> other mail, but I think that is more simple, only that I don't was >>>> >> aware of the already inplace support in Cuis itself. >>>> > >>>> > Yes, that took me as well some time to find out that Cuis indeed has >>>> > some limited Unicode support. >>>> > >>>> > Juan originally wrote that Cuis had dropped Unicode support. >>>> > >>>> > When I have a look at Cuis from outside I cannot say that it is the >>>> > case as Cuis consumes and writes UFT8 text files. Unicode text >>>> > snippets pasted through the clipboard into a Cuis TextEditor also >>>> > pass >>>> > in well. The only limitation is that internally it only handles the >>>> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15. >>>> > And if I work in a Cuis workspace with >>>> > >>>> > nn asCharacter >>>> > >>>> > where nn is an Integer >>>> > >>>> > nn must belong to ISO_8859-15 >>>> > >>>> > >>>> > ISO_8859-15 is good for most European languages. If we would have an >>>> > Add-On to cater for occasional other characters of Unicode which do >>>> > not fall into the set covered by ISO_8859-15 that would make UTF8 >>>> > text >>>> > file processing with Cuis safe. >>>> > >>>> > >>>> > --Hannes >>>> > >>>> > >>>> >> >>>> >> Germàn. >>>> >> >>>> >> 2013/1/22 H. Hirzel <[hidden email]>: >>>> >>> Hello Germán and Juan >>>> >>> >>>> >>> As we have seen we can say that Cuis handles Unicode to a certain >>>> >>> limited extent. >>>> >>> >>>> >>> I will post summary a writeup of what I know about it later. I am >>>> >>> interested in working/contributing to an add-on which loads Unicode >>>> >>> support into Cuis. >>>> >>> >>>> >>> For general work I need >>>> >>> >>>> >>> a) >>>> >>> an add-on so that Cuis can process arbitrary UFT8 text files. >>>> >>> However >>>> >>> the majority of the content characters will fall into the >>>> >>> https://de.wikipedia.org/wiki/ISO_8859-15 >>>> >>> range. So it is fine if the other characters are rendered as \unnn >>>> >>> or >>>> >>> &#nnn; >>>> >>> >>>> >>> b) >>>> >>> Another more rewarding put maybe more difficult way would be to >>>> >>> replace the String class with a class which handles 16bit >>>> >>> characters >>>> >>> instead of 8 bit characters. In terms of structure all would remain >>>> >>> the same. Characters would be 16bit like in Java. >>>> >>> >>>> >>> >>>> >>> This will come later. At the moment I am working on ContentPack >>>> >>> version 2 which will run on Cuis, Squeak and Pharo. >>>> >>> >>>> >>> Kind regards >>>> >>> >>>> >>> --Hannes >>>> >>> >>>> >>>> 2013/1/22 Germán Arduino <[hidden email]>: >>>> >>>>> Thanks for the comments Hannes / Juan: >>>> >>>>> >>>> >>>>> I will look into it when have time, or if you prefer Hannes and >>>> >>>>> want >>>> >>>>> to help I will integrate it when finish with Aida. >>>> >>>>> >>>> >>>>> Germán. >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> 2013/1/21 Juan Vuletich <[hidden email]>: >>>> >>>>>> Hi Germán, >>>> >>>>>> >>>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8 >>>> >>>>>> for >>>> >>>>>> the >>>> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin >>>> >>>>>> alphabets). >>>> >>>>>> >>>> >>>>>> Cheers, >>>> >>>>>> Juan Vuletich >>>> >>>>>> >>>> >>>>>> Germán Arduino wrote: >>>> >>>>>>> >>>> >>>>>>> Hi: >>>> >>>>>>> >>>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with >>>> >>>>>>> all >>>> >>>>>>> tests green are ready to install. >>>> >>>>>>> >>>> >>>>>>> The changes I did in Swazoo are: >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> - Avoid Unicode support that don't exist in Cuis >>>> >>>>>>> >>>> >>> ...... >>>> >>> >>>> >>> _______________________________________________ >>>> >>> Cuis mailing list >>>> >>> [hidden email] >>>> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Sincerely, >>>> >> Germán Arduino >>>> >> about.me/garduino >>>> >> >>>> >> _______________________________________________ >>>> >> Cuis mailing list >>>> >> [hidden email] >>>> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>> >> >>>> > >>>> >>>> _______________________________________________ >>>> Cuis mailing list >>>> [hidden email] >>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >>>> >>>> >>> >>> >>> -- >>> Casey Ransberger >>> >> > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
In reply to this post by Hannes Hirzel
On Tue, 5 Feb 2013 09:20:24 +0000
"H. Hirzel" <[hidden email]> wrote: > Hello all > > In the meantime I am investigating how to construct a small library > which works with WideCharacters and WideStrings and the FileStream and > UTF8Converter which deals with it. Hannes, Indeed Unicode is moby complex. http://www.unicode.org/versions/Unicode6.2.0/ I don't know if it helps, but Scheme has probably the minimal defined Unicode support -- basically read/write, code points, comparisons, and up/down-casing. The scheme standards group has argued Unicode implementation features for years. [See 7th draft] http://scheme-reports.org/2012/working-group-1.html Chibi-Scheme is a bytecode implementation written in C which implements this support. https://code.google.com/p/chibi-scheme/ This might be a stretch, but the implementation strategy has been gone over by many eyeballs. $0.02, -KenD _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
-KenD
|
Ken
Having a comparison of a specification/implementation of a simple Unicode layer in another language is helpful. https://code.google.com/p/chibi-scheme/source/browse/lib/scheme/char.sld [1] So my aim is at doing something similar in the sense that I want to leave Cuis 4.1 more or less as is (maybe minor corrections) and then have an Add-On for more Unicode support. Thank you --Hannes [1] (define-library (scheme char) (import (scheme base)) (cond-expand (full-unicode (import (chibi char-set full) (chibi char-set base) (chibi iset base)) (include "char/full.scm") (include "char/case-offsets.scm")) (else (include "char/ascii.scm") (import (only (chibi) string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>? char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>? char-alphabetic? char-lower-case? char-numeric? char-upper-case? char-whitespace? digit-value char-upcase char-downcase)))) (include "digit-value.scm") (export char-alphabetic? char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>? char-downcase char-foldcase char-lower-case? char-numeric? char-upcase char-upper-case? char-whitespace? digit-value string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>? string-downcase string-foldcase string-upcase)) On 2/6/13, Ken Dickey <[hidden email]> wrote: > On Tue, 5 Feb 2013 09:20:24 +0000 > "H. Hirzel" <[hidden email]> wrote: > >> Hello all >> >> In the meantime I am investigating how to construct a small library >> which works with WideCharacters and WideStrings and the FileStream and >> UTF8Converter which deals with it. > > Hannes, > > Indeed Unicode is moby complex. > > http://www.unicode.org/versions/Unicode6.2.0/ > > I don't know if it helps, but Scheme has probably the minimal defined > Unicode support -- basically read/write, code points, comparisons, and > up/down-casing. The scheme standards group has argued Unicode implementation > features for years. [See 7th draft] > http://scheme-reports.org/2012/working-group-1.html > > Chibi-Scheme is a bytecode implementation written in C which implements this > support. > > https://code.google.com/p/chibi-scheme/ > > This might be a stretch, but the implementation strategy has been gone over > by many eyeballs. > > $0.02, > -KenD > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Hi people!
I just found: http://wiki.squeak.org/squeak/857 Unicode at Squeak
http://www.is.titech.ac.jp/~ohshima/squeak/squeak-multilingual-e.html (pending, to read) It's not clear to me (I'm not a Smalltalker/Squeaker/Cuiser):
Ohsima work, is a change in the Squeak VM? or in String class using pure Smalltalk/Squeak? Why not is that work included in Cuis? It cannot be ported? Angel "Java" Lopez
@ajlopez
On Wed, Feb 6, 2013 at 8:24 AM, H. Hirzel <[hidden email]> wrote: Ken _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Hello Angel
On 2/6/13, Angel Java Lopez <[hidden email]> wrote: > Hi people! > > I just found: > http://wiki.squeak.org/squeak/857 Unicode at Squeak > http://www.is.titech.ac.jp/~ohshima/squeak/ > http://www.is.titech.ac.jp/~ohshima/squeak/squeak-multilingual-e.html(pending, > to read) Thank you for reminding us of these documents. They contain information about the implementation of Unicode in Squeak 3.8 which was release in 2005. I have added the references you sent to the UnicodeNotes.md document https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md > It's not clear to me (I'm not a Smalltalker/Squeaker/Cuiser): > > Ohsima work, is a change in the Squeak VM? or in String class using pure > Smalltalk/Squeak? It is mainly more Smalltalk code (String, ByteString, WideString, MultiFileByteStream, TextConverter, UTF8TextConverter, many more .....) but in addition certain changes had to be made to the virtual machine. For example the clipboard is now in Unicode (UTF8). > Why not is that work included in Cuis? It cannot be ported? That is what we are aiming at here :-) The question is what exactly? And how should we adapt/change it? Make it simpler? I have started a repository https://github.com/hhzl/Cuis-Multilingual-TextConversion where I copy three classes of Squeak at the moment https://github.com/hhzl/Cuis-Multilingual-TextConversion/tree/master/CopiedFromSqueak Actually I copy only two classes, the abstract class TextConverter I filed in only the class definition and I am now adding methods one by one of what I need). Maybe I fold the code later into UTF8TextConverter The reason why it was not ported by Juan is that he wanted to focus on Morphic and leave out some complex subsystems like Unicode support, Monticello and others. The Unicode support in Squeak models 'language'. For example I Squeak 4.4. we have the TextConverter class refering to a LanguageEnvironment TextConverter class>>defaultSystemConverter ^LanguageEnvironment defaultSystemConverter defaultSystemConverter and then LanguageEnvironment class>>defaultSystemConverter SystemConverterClass ifNil: [SystemConverterClass := self currentPlatform class systemConverterClass]. ^ SystemConverterClass new. which refers to class Locale in the category 'System-Localization' So the question is what should be adapt. The current character class in Cuis is 8 bit only. Not that there couldn't be more as they are integers which are 32 bit but it restricted on purpose. What is named String in Cuis is a ByteString in Squeak. Juan has reworked the Character / String classes considerably. It is a nice implementation for ISO8859-15 and in some cases surpasses what is in Squeak. And it is more 'compact' and 'cleaner'. And it has 'hooks' for Unicode as outlined here https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md#implementation-in-cuis-41 At the moment I want to focus on a library which when added permits Cuis4.1 to read and write UFT8 files. This is possible as of now but not in the File List (see https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md#implementation-in-cuis-41 with the screen shot here http://jvuletich.org/pipermail/cuis_jvuletich.org/attachments/20130205/915f4469/attachment-0001.png) --Hannes > Angel "Java" Lopez > @ajlopez > > On Wed, Feb 6, 2013 at 8:24 AM, H. Hirzel <[hidden email]> wrote: > >> Ken >> >> Having a comparison of a specification/implementation of a simple >> Unicode layer in another language is helpful. >> >> https://code.google.com/p/chibi-scheme/source/browse/lib/scheme/char.sld >> [1] >> >> So my aim is at doing something similar in the sense that I want to >> leave Cuis 4.1 more or less as is (maybe minor corrections) and then >> have an Add-On for more Unicode support. >> >> Thank you >> >> --Hannes >> >> >> >> [1] >> (define-library (scheme char) >> (import (scheme base)) >> (cond-expand >> (full-unicode >> (import (chibi char-set full) >> (chibi char-set base) >> (chibi iset base)) >> (include "char/full.scm") >> (include "char/case-offsets.scm")) >> (else >> (include "char/ascii.scm") >> (import >> (only (chibi) >> string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>? >> char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>? >> char-alphabetic? char-lower-case? char-numeric? >> char-upper-case? char-whitespace? digit-value >> char-upcase char-downcase)))) >> (include "digit-value.scm") >> (export >> char-alphabetic? char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>? >> char-downcase char-foldcase char-lower-case? char-numeric? >> char-upcase char-upper-case? char-whitespace? digit-value >> string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>? >> string-downcase string-foldcase string-upcase)) >> >> On 2/6/13, Ken Dickey <[hidden email]> wrote: >> > On Tue, 5 Feb 2013 09:20:24 +0000 >> > "H. Hirzel" <[hidden email]> wrote: >> > >> >> Hello all >> >> >> >> In the meantime I am investigating how to construct a small library >> >> which works with WideCharacters and WideStrings and the FileStream and >> >> UTF8Converter which deals with it. >> > >> > Hannes, >> > >> > Indeed Unicode is moby complex. >> > >> > http://www.unicode.org/versions/Unicode6.2.0/ >> > >> > I don't know if it helps, but Scheme has probably the minimal defined >> > Unicode support -- basically read/write, code points, comparisons, and >> > up/down-casing. The scheme standards group has argued Unicode >> implementation >> > features for years. [See 7th draft] >> > http://scheme-reports.org/2012/working-group-1.html >> > >> > Chibi-Scheme is a bytecode implementation written in C which implements >> this >> > support. >> > >> > https://code.google.com/p/chibi-scheme/ >> > >> > This might be a stretch, but the implementation strategy has been gone >> over >> > by many eyeballs. >> > >> > $0.02, >> > -KenD >> > >> > _______________________________________________ >> > Cuis mailing list >> > [hidden email] >> > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> > >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
On Wed, 6 Feb 2013 12:21:56 +0000
"H. Hirzel" <[hidden email]> wrote: > The reason why it was not ported by Juan is that he wanted to focus on > Morphic and leave out some complex subsystems like Unicode support, > Monticello and others. I'd just like echo these sentiments. I think moving Morphic forward is the highest value. IMHO, anything we can do to help and/or not hinder Juan is goodness. Cheers, -KenD _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
-KenD
|
In reply to this post by Angel Java Lopez
Hi Angel,
On 2/6/2013 8:33 AM, Angel Java Lopez wrote: > Hi people! > > I just found: > http://wiki.squeak.org/squeak/857 Unicode at Squeak > http://www.is.titech.ac.jp/~ohshima/squeak/ > <http://www.is.titech.ac.jp/%7Eohshima/squeak/> > http://www.is.titech.ac.jp/~ohshima/squeak/squeak-multilingual-e.html > <http://www.is.titech.ac.jp/%7Eohshima/squeak/squeak-multilingual-e.html> > (pending, to read) > > It's not clear to me (I'm not a Smalltalker/Squeaker/Cuiser): > > Ohsima work, is a change in the Squeak VM? or in String class using > pure Smalltalk/Squeak? > Why not is that work included in Cuis? It cannot be ported? > > Angel "Java" Lopez > @ajlopez From http://www.jvuletich.org/Cuis/CuisReleaseNotes.html: "For instance, Cuis also doesn't include Unicode support. The handling of Unicode characters and strings in Squeak falls in b (too complex), as the whole system is affected and c (not stable), as bugs are still arising, even after being used for many years. Besides, as the basic Character and String were not modified, but new classes for WideCharacter and WideString were introduced, we can also consider it falls under a (optional in nature)." Cheers, Juan Vuletich _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
In reply to this post by KenDickey
Thanks Folks!
Cheers, Juan Vuletich On 2/6/2013 10:07 PM, Ken Dickey wrote: > On Wed, 6 Feb 2013 12:21:56 +0000 > "H. Hirzel"<[hidden email]> wrote: > >> The reason why it was not ported by Juan is that he wanted to focus on >> Morphic and leave out some complex subsystems like Unicode support, >> Monticello and others. > I'd just like echo these sentiments. I think moving Morphic forward is the highest value. > > IMHO, anything we can do to help and/or not hinder Juan is goodness. > > Cheers, > -KenD > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > > > ----- > Se certifico que el correo no contiene virus. > Comprobada por AVG - www.avg.es > Version: 2013.0.2897 / Base de datos de virus: 2639/6086 - Fecha de la version: 06/02/2013 > > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Free forum by Nabble | Edit this page |