Hello Juan, Angel, Janko, Germán, Kean and others.
Thank your for sharing ideas and engaging in this discussion. As we are discussing now a possible Unicode support on the conceptual level the question arises: What are the minimal requirements for Unicode support for Cuis? Let me state how I see it: (perception form outside , no implementation issues) ## First of all surely Support of reading and writing of UFT8 encoded files. To a certain extent this is the case as of now. But is should be improved. This may or may not mean to drop ISO8859-15 completely as file encoding format. ## Secondly It should be possible to port web libraries like Swazoo, WebClient, Aida, Zinc, Altitude and others without problems. This does not necessarily mean that the files appear 'nicely' in Cuis. ## Thirdly Unicode support for display in Cuis. Probably still the major European languages plus special symbols. More or less the state as is plus a few more symbols/ ## Fourthly Having a foundation that people who want to add additional support (e.g. Korean, Russian, Japanese) can go ahead and implement add-on-libraries. ## Focus We should focus on 1) and 2) first. As of now I think I am on a good way to support this with an external library. As for the changes in Cuis as such I think we should be careful not to add too much complexity. I prefer Cuis to remain lean. ## CONCLUSION Let's us come to an agreement a) what the minimal support for Unicode should be. b) what is the task of Juan to add into Cuis and what should reside in external libraries. In the meantime I'll update the notes https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md here describing what the current Unicode support is. Kind regards Hannes _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Hi Hannes, Folks,
I agree with all you say, and I started working a bit on the needed support in base Cuis. For your points 1) and 2), I published a few updates. You recently implemented NCRs (i.e. like 𞃳) when converting UTF-8 to String. I added hex besides decimal, and most important, loss less conversion back to UTF-8. This would allow editing any UTF-8 text file, or copy/paste to external editors (without being able to actually see any char outside Latin, of course). I didn't hook it yet into the system. It should also be enough to serve any Unicode content in web servers such as Aida. I also enhanced a bit the style, so I moved and renamed methods. I also made UTF-8 to be stored in ByteArrays and not Strings. This opens the path to alternative String representations, and makes more sense to me. Unfortunately, this means I broke the examples at https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . I apologize for that. Please take a look. Cheers, Juan Vuletich On 2/7/2013 12:01 PM, H. Hirzel wrote: > Hello Juan, Angel, Janko, Germán, Kean and others. > > Thank your for sharing ideas and engaging in this discussion. > > As we are discussing now a possible Unicode support on the conceptual > level the question arises: > > What are the minimal requirements for Unicode support for Cuis? > > Let me state how I see it: (perception form outside , no implementation issues) > > > ## First of all surely > > > Support of reading and writing of UFT8 encoded files. > To a certain extent this is the case as of now. But is should be improved. > > This may or may not mean to drop ISO8859-15 completely as file encoding format. > > > > > > ## Secondly > > It should be possible to port web libraries like Swazoo, WebClient, > Aida, Zinc, Altitude and others without problems. This does not > necessarily mean that the files appear 'nicely' in Cuis. > > > ## Thirdly > > Unicode support for display in Cuis. Probably still the major European > languages plus special symbols. More or less the state as is plus a > few more symbols/ > > > ## Fourthly > > Having a foundation that people who want to add additional support > (e.g. Korean, Russian, Japanese) can go ahead and implement > add-on-libraries. > > > > ## Focus > > We should focus on 1) and 2) first. > > As of now I think I am on a good way to support this with an external > library. As for the changes in Cuis as such I think we should be > careful not to add too much complexity. > I prefer Cuis to remain lean. > > > ## CONCLUSION > > Let's us come to an agreement > > a) what the minimal support for Unicode should be. > > b) what is the task of Juan to add into Cuis and what should reside in > external libraries. > > > In the meantime I'll update the notes > > https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md > > here describing what the current Unicode support is. > > > Kind regards > Hannes > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > > > ----- > Se certificó que el correo no contiene virus. > Comprobada por AVG - www.avg.es > Versión: 2013.0.2897 / Base de datos de virus: 2639/6087 - Fecha de la versión: 07/02/2013 _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Hello Juan
On 2/8/13, Juan Vuletich <[hidden email]> wrote: > Hi Hannes, Folks, > > I agree with all you say, and I started working a bit on the needed > support in base Cuis. > For your points 1) and 2), I published a few updates. You recently > implemented NCRs (i.e. like 𞃳) when converting UTF-8 to String. > I added hex besides decimal, and most important, loss less conversion > back to UTF-8. Good. I have seen the parsing in the new method String>>asUtf8: convertEmbeddedNCRs > This would allow editing any UTF-8 text file, or copy/paste to external > editors (without being able to actually see any char outside Latin, of > course). Yes, this indeed will expand the area of application possibilities for Cuis. >I didn't hook it yet into the system. It should also be enough > to serve any Unicode content in web servers such as Aida. Yes, If you google for statistics character encoding html the first hit claims that 75% of all web pages are encoded as UTF8. > I also enhanced a bit the style, so I moved and renamed methods. I also > made UTF-8 to be stored in ByteArrays and not Strings. This opens the > path to alternative String representations, and makes more sense to me. String is ArrayedCollection variableByteSubclass: #String and ByteArray is a sibling class ArrayedCollection variableByteSubclass: #ByteArray That is fine. > Unfortunately, this means I broke the examples at > https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . I > apologize for that. I do not mind :-) I'll update them. > Please take a look. I did and will do more. The next question is now how to hook these Unicode utility methods into the system. Thank you for the updates in Cuis. Regards --Hannes > On 2/7/2013 12:01 PM, H. Hirzel wrote: >> Hello Juan, Angel, Janko, Germán, Kean and others. >> >> Thank your for sharing ideas and engaging in this discussion. >> >> As we are discussing now a possible Unicode support on the conceptual >> level the question arises: >> >> What are the minimal requirements for Unicode support for Cuis? >> >> Let me state how I see it: (perception form outside , no implementation >> issues) >> >> >> ## First of all surely >> >> >> Support of reading and writing of UFT8 encoded files. >> To a certain extent this is the case as of now. But is should be improved. >> >> This may or may not mean to drop ISO8859-15 completely as file encoding >> format. >> >> >> >> >> >> ## Secondly >> >> It should be possible to port web libraries like Swazoo, WebClient, >> Aida, Zinc, Altitude and others without problems. This does not >> necessarily mean that the files appear 'nicely' in Cuis. >> >> >> ## Thirdly >> >> Unicode support for display in Cuis. Probably still the major European >> languages plus special symbols. More or less the state as is plus a >> few more symbols/ >> >> >> ## Fourthly >> >> Having a foundation that people who want to add additional support >> (e.g. Korean, Russian, Japanese) can go ahead and implement >> add-on-libraries. >> >> >> >> ## Focus >> >> We should focus on 1) and 2) first. >> >> As of now I think I am on a good way to support this with an external >> library. As for the changes in Cuis as such I think we should be >> careful not to add too much complexity. >> I prefer Cuis to remain lean. >> >> >> ## CONCLUSION >> >> Let's us come to an agreement >> >> a) what the minimal support for Unicode should be. >> >> b) what is the task of Juan to add into Cuis and what should reside in >> external libraries. >> >> >> In the meantime I'll update the notes >> >> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md >> >> here describing what the current Unicode support is. >> >> >> Kind regards >> Hannes >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> >> >> ----- >> Se certificó que el correo no contiene virus. >> Comprobada por AVG - www.avg.es >> Versión: 2013.0.2897 / Base de datos de virus: 2639/6087 - Fecha de la >> versión: 07/02/2013 > > > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Hello Juan
> On 2/8/13, Juan Vuletich <[hidden email]> wrote: >> Unfortunately, this means I broke the examples at >> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . I have updated the file UnicodeNotes.md and I did a test class (attached) which shows how to read and write an UTF8 file. test5ReadWriteUtf8 "see UnicodeNotes.md" "self new test5ReadWriteUtf8" | stream content byteArray byteArray2 | "read UTF8 Unicode file into internal string with NCRs" "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference" stream := (FileStream fileNamed: self class fileName) binary. byteArray := stream contentsOfEntireFile. content := String fromUtf8: byteArray. "NCRs were added to 'content' as needed" "write internal string back to UTF8 file with NCRs converted back to UTF8 chars" stream := (FileStream forceNewFileNamed: self class fileName2) binary. stream nextPutAll: (content asUtf8: true). "true means: convert NCRs back to UTF8" stream close. "compare the two versions: what is in file 'fileName' with what is in file 'fileName2'" stream := (FileStream fileNamed: self class fileName) binary. byteArray := stream contentsOfEntireFile. stream close. stream := (FileStream fileNamed: self class fileName2) binary. byteArray2 := stream contentsOfEntireFile. stream close. self assert: byteArray = byteArray2. BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and variants' UTF8 should all be uppercase. As of now I can use Cuis 4.1-1590 as is for my work which includes reading and writing UTF8 encoded text files (including HTML files). So as far as I am concerned further extended Cuis Unicode support might be put on the back burner for some time. However it might still be worthwhile considering maintaining a TextConverter and UTF8Converter class for compatibility and other reasons. More on this later. Thank you for the update https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st and kind regards Hannes Hirzel _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org UnicodeTest4dot1dash1590.st (5K) Download Attachment |
P.S.
To read an UTF8 file I do in Cuis 4.1-1590 String fromUtf8: (FileStream fileNamed: 'anUTF8file.txt') contentsOfEntireFile To write an UTF8 file I do (FileStream forceNewFileNamed: 'aFileName.txt') binary; nextPutAll: ('abc àè€ αβγ' asUtf8: true); " 'true' means 'convert Numerical Character References back to UTF8' " close. On 2/13/13, H. Hirzel <[hidden email]> wrote: > Hello Juan > >> On 2/8/13, Juan Vuletich <[hidden email]> wrote: > >>> Unfortunately, this means I broke the examples at >>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . > > I have updated the file UnicodeNotes.md > > and I did a test class (attached) which shows how to read and write an > UTF8 file. > > test5ReadWriteUtf8 > > "see UnicodeNotes.md" > > "self new test5ReadWriteUtf8" > | stream content byteArray byteArray2 | > > "read UTF8 Unicode file into internal string with NCRs" > "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference" > > stream := (FileStream fileNamed: self class fileName) binary. > byteArray := stream contentsOfEntireFile. > content := String fromUtf8: byteArray. > "NCRs were added to 'content' as needed" > > "write internal string back to UTF8 file with NCRs converted back to > UTF8 chars" > stream := (FileStream forceNewFileNamed: self class fileName2) binary. > stream nextPutAll: (content asUtf8: true). "true means: convert NCRs > back to UTF8" > stream close. > > "compare the two versions: what is in file 'fileName' with what > is in file 'fileName2'" > stream := (FileStream fileNamed: self class fileName) binary. > byteArray := stream contentsOfEntireFile. > stream close. > > stream := (FileStream fileNamed: self class fileName2) binary. > byteArray2 := stream contentsOfEntireFile. > stream close. > > self assert: byteArray = byteArray2. > > > BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and > variants' > UTF8 should all be uppercase. > > As of now I can use Cuis 4.1-1590 as is for my work which includes > reading and writing UTF8 encoded text files (including HTML files). So > as far as I am concerned further extended Cuis Unicode support might > be put on the back burner for some time. > > However it might still be worthwhile considering maintaining a > TextConverter and UTF8Converter class for compatibility and other > reasons. More on this later. > > Thank you for the update > > https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st > > and > > kind regards > > Hannes Hirzel > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
In reply to this post by Hannes Hirzel
Hi Hannes,
Thanks for this. I added your new test. BTW, in later changes, I tweaked a bit the protocol, adding a flag to skip a trailing null. This was needed for Windows clipboard. Can you update the examples again? BTW, some of the Cuis methods in UnicodeNotes.md are outdated as well... Cheers, Juan Vuletich On 2/13/2013 2:46 PM, H. Hirzel wrote: Hello JuanOn 2/8/13, Juan Vuletich [hidden email] wrote:Unfortunately, this means I broke the examples at https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .I have updated the file UnicodeNotes.md and I did a test class (attached) which shows how to read and write an UTF8 file. test5ReadWriteUtf8 "see UnicodeNotes.md" "self new test5ReadWriteUtf8" | stream content byteArray byteArray2 | "read UTF8 Unicode file into internal string with NCRs" "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference" stream := (FileStream fileNamed: self class fileName) binary. byteArray := stream contentsOfEntireFile. content := String fromUtf8: byteArray. "NCRs were added to 'content' as needed" "write internal string back to UTF8 file with NCRs converted back to UTF8 chars" stream := (FileStream forceNewFileNamed: self class fileName2) binary. stream nextPutAll: (content asUtf8: true). "true means: convert NCRs back to UTF8" stream close. "compare the two versions: what is in file 'fileName' with what is in file 'fileName2'" stream := (FileStream fileNamed: self class fileName) binary. byteArray := stream contentsOfEntireFile. stream close. stream := (FileStream fileNamed: self class fileName2) binary. byteArray2 := stream contentsOfEntireFile. stream close. self assert: byteArray = byteArray2. BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and variants' UTF8 should all be uppercase. As of now I can use Cuis 4.1-1590 as is for my work which includes reading and writing UTF8 encoded text files (including HTML files). So as far as I am concerned further extended Cuis Unicode support might be put on the back burner for some time. However it might still be worthwhile considering maintaining a TextConverter and UTF8Converter class for compatibility and other reasons. More on this later. Thank you for the update https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st and kind regards Hannes Hirzel_______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
In reply to this post by Hannes Hirzel
Hi Hannes,
On 2/13/2013 2:46 PM, H. Hirzel wrote: Hello JuanOn 2/8/13, Juan Vuletich [hidden email] wrote:Unfortunately, this means I broke the examples at https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .I have updated the file UnicodeNotes.md Thanks. Just a small correction. You say "Note: #utf8ToISO8859s15 is only used by the clipboard.", but that's no longer true, as now #fromUtf8: is used. and I did a test class (attached) which shows how to read and write an UTF8 file. test5ReadWriteUtf8 "see UnicodeNotes.md" "self new test5ReadWriteUtf8" | stream content byteArray byteArray2 | "read UTF8 Unicode file into internal string with NCRs" "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference" stream := (FileStream fileNamed: self class fileName) binary. byteArray := stream contentsOfEntireFile. content := String fromUtf8: byteArray. "NCRs were added to 'content' as needed" "write internal string back to UTF8 file with NCRs converted back to UTF8 chars" stream := (FileStream forceNewFileNamed: self class fileName2) binary. stream nextPutAll: (content asUtf8: true). "true means: convert NCRs back to UTF8" stream close. "compare the two versions: what is in file 'fileName' with what is in file 'fileName2'" stream := (FileStream fileNamed: self class fileName) binary. byteArray := stream contentsOfEntireFile. stream close. stream := (FileStream fileNamed: self class fileName2) binary. byteArray2 := stream contentsOfEntireFile. stream close. self assert: byteArray = byteArray2. BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and variants' UTF8 should all be uppercase. As of now I can use Cuis 4.1-1590 as is for my work which includes reading and writing UTF8 encoded text files (including HTML files). So as far as I am concerned further extended Cuis Unicode support might be put on the back burner for some time. However it might still be worthwhile considering maintaining a TextConverter and UTF8Converter class for compatibility and other reasons. More on this later. Thank you for the update https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st and kind regards Hannes Hirzel_______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
In reply to this post by Juan Vuletich-4
Hello Juan
On 2/21/13, Juan Vuletich <[hidden email]> wrote: > Thanks for this. I added your new test. Yes, thank you. I have seen StringTest new testAsUtf8 and StringTest new testAsUtf8WithNCRs or is it something different? > BTW, in later changes, I tweaked a bit the protocol, adding a flag to > skip a trailing null. This was needed for Windows clipboard. fromUtf8 is now String>> fromUtf8: aByteArray hex: useHexForNCRs trimLastNull: doTrimLastNullChar "Convert the given string from UTF-8 to the internal encoding: ISO Latin 9 (ISO 8859-15)" "For unicode chars not in ISO Latin 9 (ISO 8859-15), embed Decimal NCRs or Hexadecimal NCRs according to useHex. See http://en.wikipedia.org/wiki/Numeric_character_reference See http://rishida.net/tools/conversion/. Tests prepared there. Note: The conversion of NCRs is reversible. See #asUtf8: This allows handling the full Unicode in Cuis tools, that can only display the Latin alphabet, by editing the NCRs. The conversions can be done when reading / saving files, or when pasting from Clipboard and storing back on it." Can you > update the examples again? Which examples? I have added you as a collaborator for https://github.com/hhzl/Cuis-Add-Ons so that you can mark it directly through the github web interface. BTW, some of the Cuis methods in > UnicodeNotes.md are outdated as well... OK, noted. Regards --Hannes > On 2/13/2013 2:46 PM, H. Hirzel wrote: >> Hello Juan >> >>> On 2/8/13, Juan Vuletich<[hidden email]> wrote: >>>> Unfortunately, this means I broke the examples at >>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . >> I have updated the file UnicodeNotes.md >> >> and I did a test class (attached) which shows how to read and write an >> UTF8 file. >> >> test5ReadWriteUtf8 >> >> "see UnicodeNotes.md" >> >> "self new test5ReadWriteUtf8" >> | stream content byteArray byteArray2 | >> >> "read UTF8 Unicode file into internal string with NCRs" >> "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference" >> >> stream := (FileStream fileNamed: self class fileName) binary. >> byteArray := stream contentsOfEntireFile. >> content := String fromUtf8: byteArray. >> "NCRs were added to 'content' as needed" >> >> "write internal string back to UTF8 file with NCRs converted back to >> UTF8 chars" >> stream := (FileStream forceNewFileNamed: self class fileName2) binary. >> stream nextPutAll: (content asUtf8: true). "true means: convert NCRs >> back to UTF8" >> stream close. >> >> "compare the two versions: what is in file 'fileName' with what >> is in file 'fileName2'" >> stream := (FileStream fileNamed: self class fileName) binary. >> byteArray := stream contentsOfEntireFile. >> stream close. >> >> stream := (FileStream fileNamed: self class fileName2) binary. >> byteArray2 := stream contentsOfEntireFile. >> stream close. >> >> self assert: byteArray = byteArray2. >> >> >> BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and >> variants' >> UTF8 should all be uppercase. >> >> As of now I can use Cuis 4.1-1590 as is for my work which includes >> reading and writing UTF8 encoded text files (including HTML files). So >> as far as I am concerned further extended Cuis Unicode support might >> be put on the back burner for some time. >> >> However it might still be worthwhile considering maintaining a >> TextConverter and UTF8Converter class for compatibility and other >> reasons. More on this later. >> >> Thank you for the update >> >> https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st >> >> and >> >> kind regards >> >> Hannes Hirzel >> >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
In reply to this post by Juan Vuletich-4
On 2/21/13, Juan Vuletich <[hidden email]> wrote:
> Hi Hannes, > > On 2/13/2013 2:46 PM, H. Hirzel wrote: >> Hello Juan >> >>> On 2/8/13, Juan Vuletich<[hidden email]> wrote: >>>> Unfortunately, this means I broke the examples at >>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . >> I have updated the file UnicodeNotes.md > > Thanks. Just a small correction. You say "Note: #utf8ToISO8859s15 is > only used by the clipboard.", but that's no longer true, as now > #fromUtf8: is used. DONE --HJH > >> and I did a test class (attached) which shows how to read and write an >> UTF8 file. >> >> test5ReadWriteUtf8 >> >> "see UnicodeNotes.md" >> >> "self new test5ReadWriteUtf8" >> | stream content byteArray byteArray2 | >> >> "read UTF8 Unicode file into internal string with NCRs" >> "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference" >> >> stream := (FileStream fileNamed: self class fileName) binary. >> byteArray := stream contentsOfEntireFile. >> content := String fromUtf8: byteArray. >> "NCRs were added to 'content' as needed" >> >> "write internal string back to UTF8 file with NCRs converted back to >> UTF8 chars" >> stream := (FileStream forceNewFileNamed: self class fileName2) binary. >> stream nextPutAll: (content asUtf8: true). "true means: convert NCRs >> back to UTF8" >> stream close. >> >> "compare the two versions: what is in file 'fileName' with what >> is in file 'fileName2'" >> stream := (FileStream fileNamed: self class fileName) binary. >> byteArray := stream contentsOfEntireFile. >> stream close. >> >> stream := (FileStream fileNamed: self class fileName2) binary. >> byteArray2 := stream contentsOfEntireFile. >> stream close. >> >> self assert: byteArray = byteArray2. >> >> >> BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and >> variants' >> UTF8 should all be uppercase. >> >> As of now I can use Cuis 4.1-1590 as is for my work which includes >> reading and writing UTF8 encoded text files (including HTML files). So >> as far as I am concerned further extended Cuis Unicode support might >> be put on the back burner for some time. >> >> However it might still be worthwhile considering maintaining a >> TextConverter and UTF8Converter class for compatibility and other >> reasons. More on this later. >> >> Thank you for the update >> >> https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st >> >> and >> >> kind regards >> >> Hannes Hirzel >> >> >> _______________________________________________ >> Cuis mailing list >> [hidden email] >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
In reply to this post by Hannes Hirzel
Hi Hannes,
I had trouble with my internet connection. Now, the updates are at GitHub. Please check them. The test I added is your UnicodeTest, verbatim. Cheers, Juan Vuletich On 2/21/2013 7:00 PM, H. Hirzel wrote: > Hello Juan > > On 2/21/13, Juan Vuletich<[hidden email]> wrote: >> Thanks for this. I added your new test. > Yes, thank you. I have seen > StringTest new testAsUtf8 > and > StringTest new testAsUtf8WithNCRs > > or is it something different? > > > > >> BTW, in later changes, I tweaked a bit the protocol, adding a flag to >> skip a trailing null. This was needed for Windows clipboard. > fromUtf8 is now > > > String>> > fromUtf8: aByteArray hex: useHexForNCRs trimLastNull: doTrimLastNullChar > "Convert the given string from UTF-8 to the internal encoding: ISO > Latin 9 (ISO 8859-15)" > "For unicode chars not in ISO Latin 9 (ISO 8859-15), embed Decimal > NCRs or Hexadecimal NCRs according to useHex. > > See http://en.wikipedia.org/wiki/Numeric_character_reference > See http://rishida.net/tools/conversion/. Tests prepared there. > > Note: The conversion of NCRs is reversible. See #asUtf8: > This allows handling the full Unicode in Cuis tools, that can only > display the Latin alphabet, by editing the NCRs. > The conversions can be done when reading / saving files, or when > pasting from Clipboard and storing back on it." > > > > Can you >> update the examples again? > Which examples? I have added you as a collaborator for > https://github.com/hhzl/Cuis-Add-Ons > so that you can mark it directly through the github web interface. > > > BTW, some of the Cuis methods in >> UnicodeNotes.md are outdated as well... > OK, noted. > > Regards > --Hannes > > >> On 2/13/2013 2:46 PM, H. Hirzel wrote: >>> Hello Juan >>> >>>> On 2/8/13, Juan Vuletich<[hidden email]> wrote: >>>>> Unfortunately, this means I broke the examples at >>>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . >>> I have updated the file UnicodeNotes.md >>> >>> and I did a test class (attached) which shows how to read and write an >>> UTF8 file. >>> >>> test5ReadWriteUtf8 >>> >>> "see UnicodeNotes.md" >>> >>> "self new test5ReadWriteUtf8" >>> | stream content byteArray byteArray2 | >>> >>> "read UTF8 Unicode file into internal string with NCRs" >>> "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference" >>> >>> stream := (FileStream fileNamed: self class fileName) binary. >>> byteArray := stream contentsOfEntireFile. >>> content := String fromUtf8: byteArray. >>> "NCRs were added to 'content' as needed" >>> >>> "write internal string back to UTF8 file with NCRs converted back to >>> UTF8 chars" >>> stream := (FileStream forceNewFileNamed: self class fileName2) binary. >>> stream nextPutAll: (content asUtf8: true). "true means: convert NCRs >>> back to UTF8" >>> stream close. >>> >>> "compare the two versions: what is in file 'fileName' with what >>> is in file 'fileName2'" >>> stream := (FileStream fileNamed: self class fileName) binary. >>> byteArray := stream contentsOfEntireFile. >>> stream close. >>> >>> stream := (FileStream fileNamed: self class fileName2) binary. >>> byteArray2 := stream contentsOfEntireFile. >>> stream close. >>> >>> self assert: byteArray = byteArray2. >>> >>> >>> BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and >>> variants' >>> UTF8 should all be uppercase. >>> >>> As of now I can use Cuis 4.1-1590 as is for my work which includes >>> reading and writing UTF8 encoded text files (including HTML files). So >>> as far as I am concerned further extended Cuis Unicode support might >>> be put on the back burner for some time. >>> >>> However it might still be worthwhile considering maintaining a >>> TextConverter and UTF8Converter class for compatibility and other >>> reasons. More on this later. >>> >>> Thank you for the update >>> >>> https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st >>> >>> and >>> >>> kind regards >>> >>> Hannes Hirzel >>> >>> >>> _______________________________________________ >>> Cuis mailing list >>> [hidden email] >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org >> > _______________________________________________ > Cuis mailing list > [hidden email] > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org > _______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org |
Free forum by Nabble | Edit this page |