Dear Squeakers,
Please find attached an Unicode patch, which deals with improvements of internal representation of Unicode characters. It: 1. introduce new class TwoByteString 2. change at:put: on ByteString and other such methods to "scale" string to TwoByteString or FourByteString, depending on width of a character 3. rename WideString to FourByteString for consistency, also rename all related methods 2. add category CollectionTests-Unicode with tests 3. add class UnicodeBenchmarking for measuring speed of Unicode handling like at:put speed and UTF8 conversions on included English, French, Slovenian, Russian and Chinese text. ByteString and TwoByteString also include UTF8 conversion methods, which will probably be moved to UTF8TextConverter later. I hope this patch will help improving Squeak Unicode support a bit. Best regards Janko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si unicode.1.cs.gz (33K) Download Attachment |
Hi Janko,
did you try to load your changeset in a Squeak 3.10 image? What is the status of the tests? If your changeset is good enough and if you write unit tests, it may be interesting to put your changeset into 3.10. Bye 2007/6/12, Janko Mivšek <[hidden email]>: > Dear Squeakers, > > Please find attached an Unicode patch, which deals with improvements of > internal representation of Unicode characters. It: > > 1. introduce new class TwoByteString > 2. change at:put: on ByteString and other such methods to "scale" string > to TwoByteString or FourByteString, depending on width of a character > 3. rename WideString to FourByteString for consistency, also > rename all related methods > 2. add category CollectionTests-Unicode with tests > 3. add class UnicodeBenchmarking for measuring speed of > Unicode handling like at:put speed and UTF8 conversions on included > English, French, Slovenian, Russian and Chinese text. > > ByteString and TwoByteString also include UTF8 conversion methods, which > will probably be moved to UTF8TextConverter later. > > I hope this patch will help improving Squeak Unicode support a bit. > > Best regards > Janko > > > -- > Janko Mivšek > AIDA/Web > Smalltalk Web Application Server > http://www.aidaweb.si > > > > > -- Damien Cassou |
El 6/13/07 10:18 PM, "Damien Cassou" <[hidden email]> escribió: > Hi Janko, did you try to load your changeset in a Squeak 3.10 image? What is > the status of the tests? If your changeset is good enough and if you write > unit tests, it may be interesting to put your changeset into 3.10. Dqmien, Janko Aida web works. I test yesterday and try to collaborate with he. Edgar |
In reply to this post by Damien Cassou-3
Just from glancing at the code this cannot possibly be right.
Like, in many places the isWideString test is simply replaced with isFourByteString. But the distinction we need to make is wether we have character values below 256 or above (for example to choose between the old and the MultiByteScanner). So #isWideString needs to be preserved and answer true for all Strings that have character values >= 256. As for the internal representation of TwoByteStrings; I'm not sure using big endian on all platforms is a good idea. Should certainly be discussed - like, it might be valuable to hand that string to a primitive and then platform order would be better. Also, the renaming of WideString without providing proper conversion methods will most certainly break existing projects. Then there are a lot of nits to pick - like the class comments are wrong, ByteString>>replaceFrom:... only creates 32 bit strings, bitShift is used all over the place when Smalltalk code traditionally uses * and //, what is TwoByteString>>printString good for, why does TwoByteString>>asByteString do an unnecessary copy etc. Before inclusion this still needs a lot of work and testing. - Bert - On Jun 14, 2007, at 3:18 , Damien Cassou wrote: > Hi Janko, > > did you try to load your changeset in a Squeak 3.10 image? What is the > status of the tests? > > If your changeset is good enough and if you write unit tests, it may > be interesting to put your changeset into 3.10. > > Bye > > 2007/6/12, Janko Mivšek <[hidden email]>: >> Dear Squeakers, >> >> Please find attached an Unicode patch, which deals with >> improvements of >> internal representation of Unicode characters. It: >> >> 1. introduce new class TwoByteString >> 2. change at:put: on ByteString and other such methods to "scale" >> string >> to TwoByteString or FourByteString, depending on width of a >> character >> 3. rename WideString to FourByteString for consistency, also >> rename all related methods >> 2. add category CollectionTests-Unicode with tests >> 3. add class UnicodeBenchmarking for measuring speed of >> Unicode handling like at:put speed and UTF8 conversions on >> included >> English, French, Slovenian, Russian and Chinese text. >> >> ByteString and TwoByteString also include UTF8 conversion methods, >> which >> will probably be moved to UTF8TextConverter later. >> >> I hope this patch will help improving Squeak Unicode support a bit. >> >> Best regards >> Janko >> >> >> -- >> Janko Mivšek >> AIDA/Web >> Smalltalk Web Application Server >> http://www.aidaweb.si >> >> >> >> >> > > > -- > Damien Cassou > |
In reply to this post by Damien Cassou-3
Hi Damien,
Damien Cassou wrote: > did you try to load your changeset in a Squeak 3.10 image? What is the > status of the tests? > > If your changeset is good enough and if you write unit tests, it may > be interesting to put your changeset into 3.10. That will be nice. I just don't know yet a procedure how patches from community goes through all tests and careful eyes to be included in main image. Is this written down somewhere. And for start, where can I find 3.10? Best regards Janko > Bye > > 2007/6/12, Janko Mivšek <[hidden email]>: >> Dear Squeakers, >> >> Please find attached an Unicode patch, which deals with improvements of >> internal representation of Unicode characters. It: >> >> 1. introduce new class TwoByteString >> 2. change at:put: on ByteString and other such methods to "scale" string >> to TwoByteString or FourByteString, depending on width of a character >> 3. rename WideString to FourByteString for consistency, also >> rename all related methods >> 2. add category CollectionTests-Unicode with tests >> 3. add class UnicodeBenchmarking for measuring speed of >> Unicode handling like at:put speed and UTF8 conversions on included >> English, French, Slovenian, Russian and Chinese text. >> >> ByteString and TwoByteString also include UTF8 conversion methods, which >> will probably be moved to UTF8TextConverter later. >> >> I hope this patch will help improving Squeak Unicode support a bit. >> >> Best regards >> Janko -- Janko Mivšek AIDA/Web Smalltalk Web Application Server http://www.aidaweb.si |
In reply to this post by Bert Freudenberg
> Just from glancing at the code this cannot possibly be right. > > Like, in many places the isWideString test is simply replaced with > isFourByteString. But the distinction we need to make is wether we > have character values below 256 or above (for example to choose > between the old and the MultiByteScanner). So #isWideString needs > to be preserved and answer true for all Strings that have character > values >= 256. > > As for the internal representation of TwoByteStrings; I'm not sure > using big endian on all platforms is a good idea. Should certainly > be discussed - like, it might be valuable to hand that string to a > primitive and then platform order would be better. > > Also, the renaming of WideString without providing proper > conversion methods will most certainly break existing projects. > > Then there are a lot of nits to pick - like the class comments are > wrong, ByteString>>replaceFrom:... only creates 32 bit strings, > bitShift is used all over the place when Smalltalk code > traditionally uses * and //, what is TwoByteString>>printString > good for, why does TwoByteString>>asByteString do an unnecessary > copy etc. > > Before inclusion this still needs a lot of work and testing. Sounds like. Thanks for the feedback bert. Stef |
In reply to this post by Janko Mivšek
El 6/14/07 3:23 PM, "Janko Mivšek" <[hidden email]> escribió: > I just don't know yet a procedure how patches from > community goes through all tests and careful eyes to be included in main > image. Is this written down somewhere. And for start, where can I find 3.10? Janko: Yoou could read about 3.10 http://wiki.squeak.org/squeak/5919 and follow links http://wiki.squeak.org/squeak/5990 Here you could complain how 3.10 is going and in http://ftp.squeak.org/3.10alpha/Squeak3.10alpha.7105.zip the last published image. Hope soon I solve some mistakes and could actualize to 7113 and beyond. About packages, they must go into Package Universes now. Image is going in the smaller direction to converge with Pavel works. Ralph extend the quality control of image to packages , this work just begin. Edgar |
edgar sorry to repeat it but could you send to the list the changes
that have been harvested. How can you expect that people trust this image if we do not know what is harvested and not give a chance to busy people to give a comment. The feedback of bert illustrates really that problem. Stef On 15 juin 07, at 00:10, Edgar J. De Cleene wrote: > > > El 6/14/07 3:23 PM, "Janko Mivšek" <[hidden email]> escribió: > >> I just don't know yet a procedure how patches from >> community goes through all tests and careful eyes to be included >> in main >> image. Is this written down somewhere. And for start, where can I >> find 3.10? > Janko: > > Yoou could read about 3.10 http://wiki.squeak.org/squeak/5919 and > follow > links > http://wiki.squeak.org/squeak/5990 Here you could complain how 3.10 > is going > > and in http://ftp.squeak.org/3.10alpha/Squeak3.10alpha.7105.zip the > last > published image. > > Hope soon I solve some mistakes and could actualize to 7113 and > beyond. > > About packages, they must go into Package Universes now. > Image is going in the smaller direction to converge with Pavel works. > > Ralph extend the quality control of image to packages , this work just > begin. > > Edgar > > > > |
In reply to this post by Edgar J. De Cleene
>
> About packages, they must go into Package Universes now. Why? What does it mean? > Image is going in the smaller direction to converge with Pavel works. > > Ralph extend the quality control of image to packages , this work just > begin. How? |
In reply to this post by stephane ducasse
El 6/15/07 5:28 AM, "stephane ducasse" <[hidden email]> escribió: > edgar sorry to repeat it but could you send to the list the changes > that have been harvested. > How can you expect that people trust this image if we do not know > what is harvested and > not give a chance to busy people to give a comment. > The feedback of bert illustrates really that problem. > > Stef If you read swiki .... I now you wish me out of team, so write to Ralph and give me a break. Edgar |
Free forum by Nabble | Edit this page |