(Minimal) requirements for Unicode support?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

(Minimal) requirements for Unicode support?

Hannes Hirzel
Hello Juan, Angel, Janko, Germán, Kean and others.

Thank your for sharing ideas and engaging in this discussion.

As we are discussing now a possible Unicode support on the conceptual
level the question arises:

    What are the minimal requirements for Unicode support for Cuis?

Let me state how I see it: (perception form outside , no implementation issues)


## First of all surely


Support of reading and writing of UFT8 encoded files.
To a certain extent this is the case as of now. But is should be improved.

This may or may not mean to drop ISO8859-15 completely as file encoding format.





## Secondly

It should be possible to port web libraries like Swazoo, WebClient,
Aida, Zinc, Altitude and others without problems. This does not
necessarily mean that the files appear 'nicely' in Cuis.


## Thirdly

Unicode support for display in Cuis. Probably still the major European
languages plus special symbols. More or less the state as is plus a
few more symbols/


##  Fourthly

Having a foundation that people who want to add additional support
(e.g. Korean, Russian, Japanese) can go ahead and implement
add-on-libraries.



##  Focus

We should focus on 1) and 2) first.

As of now I think I am on a good way to support this with an external
library. As for the changes in Cuis as such I think we should be
careful not to add too much complexity.
I prefer Cuis to remain lean.


## CONCLUSION

Let's us come to an agreement

a) what the minimal support for Unicode should be.

b) what is the task of Juan to add into Cuis and what should reside in
external libraries.


In the meantime I'll update the notes

   https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md

here describing what the current Unicode support is.


Kind regards
Hannes

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Juan Vuletich-4
Hi Hannes, Folks,

I agree with all you say, and I started working a bit on the needed
support in base Cuis.

For your points 1) and 2), I published a few updates. You recently
implemented NCRs (i.e. like 𞃳) when converting UTF-8 to String.
I added hex besides decimal, and most important, loss less conversion
back to UTF-8.

This would allow editing any UTF-8 text file, or copy/paste to external
editors (without being able to actually see any char outside Latin, of
course). I didn't hook it yet into the system. It should also be enough
to serve any Unicode content in web servers such as Aida.

I also enhanced a bit the style, so I moved and renamed methods. I also
made UTF-8 to be stored in ByteArrays and not Strings. This opens the
path to alternative String representations, and makes more sense to me.
Unfortunately, this means I broke the examples at
https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . I
apologize for that.

Please take a look.

Cheers,
Juan Vuletich

On 2/7/2013 12:01 PM, H. Hirzel wrote:

> Hello Juan, Angel, Janko, Germán, Kean and others.
>
> Thank your for sharing ideas and engaging in this discussion.
>
> As we are discussing now a possible Unicode support on the conceptual
> level the question arises:
>
>      What are the minimal requirements for Unicode support for Cuis?
>
> Let me state how I see it: (perception form outside , no implementation issues)
>
>
> ## First of all surely
>
>
> Support of reading and writing of UFT8 encoded files.
> To a certain extent this is the case as of now. But is should be improved.
>
> This may or may not mean to drop ISO8859-15 completely as file encoding format.
>
>
>
>
>
> ## Secondly
>
> It should be possible to port web libraries like Swazoo, WebClient,
> Aida, Zinc, Altitude and others without problems. This does not
> necessarily mean that the files appear 'nicely' in Cuis.
>
>
> ## Thirdly
>
> Unicode support for display in Cuis. Probably still the major European
> languages plus special symbols. More or less the state as is plus a
> few more symbols/
>
>
> ##  Fourthly
>
> Having a foundation that people who want to add additional support
> (e.g. Korean, Russian, Japanese) can go ahead and implement
> add-on-libraries.
>
>
>
> ##  Focus
>
> We should focus on 1) and 2) first.
>
> As of now I think I am on a good way to support this with an external
> library. As for the changes in Cuis as such I think we should be
> careful not to add too much complexity.
> I prefer Cuis to remain lean.
>
>
> ## CONCLUSION
>
> Let's us come to an agreement
>
> a) what the minimal support for Unicode should be.
>
> b) what is the task of Juan to add into Cuis and what should reside in
> external libraries.
>
>
> In the meantime I'll update the notes
>
>     https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md
>
> here describing what the current Unicode support is.
>
>
> Kind regards
> Hannes
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>
>
> -----
> Se certificó que el correo no contiene virus.
> Comprobada por AVG - www.avg.es
> Versión: 2013.0.2897 / Base de datos de virus: 2639/6087 - Fecha de la versión: 07/02/2013


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Hannes Hirzel
Hello Juan

On 2/8/13, Juan Vuletich <[hidden email]> wrote:
> Hi Hannes, Folks,
>
> I agree with all you say, and I started working a bit on the needed
> support in base Cuis.

> For your points 1) and 2), I published a few updates. You recently
> implemented NCRs (i.e. like &#123123;) when converting UTF-8 to String.
> I added hex besides decimal, and most important, loss less conversion
> back to UTF-8.

Good. I have seen the parsing in the new method
String>>asUtf8: convertEmbeddedNCRs


> This would allow editing any UTF-8 text file, or copy/paste to external
> editors (without being able to actually see any char outside Latin, of
> course).

Yes, this indeed will expand the area of application possibilities for Cuis.


>I didn't hook it yet into the system. It should also be enough
> to serve any Unicode content in web servers such as Aida.

Yes,
If you google for
   statistics character encoding  html

the first hit claims that 75% of all web pages are encoded as UTF8.




> I also enhanced a bit the style, so I moved and renamed methods. I also
> made UTF-8 to be stored in ByteArrays and not Strings. This opens the
> path to alternative String representations, and makes more sense to me.

String is
   ArrayedCollection variableByteSubclass: #String

and ByteArray is a sibling class
   ArrayedCollection variableByteSubclass: #ByteArray

That is fine.

> Unfortunately, this means I broke the examples at
> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md . I
> apologize for that.

I do not mind :-)   I'll update them.

> Please take a look.

I did and will do more.

The next question is now how to hook these Unicode utility methods
into the system.

Thank you for the updates in Cuis.

Regards

--Hannes


> On 2/7/2013 12:01 PM, H. Hirzel wrote:
>> Hello Juan, Angel, Janko, Germán, Kean and others.
>>
>> Thank your for sharing ideas and engaging in this discussion.
>>
>> As we are discussing now a possible Unicode support on the conceptual
>> level the question arises:
>>
>>      What are the minimal requirements for Unicode support for Cuis?
>>
>> Let me state how I see it: (perception form outside , no implementation
>> issues)
>>
>>
>> ## First of all surely
>>
>>
>> Support of reading and writing of UFT8 encoded files.
>> To a certain extent this is the case as of now. But is should be improved.
>>
>> This may or may not mean to drop ISO8859-15 completely as file encoding
>> format.
>>
>>
>>
>>
>>
>> ## Secondly
>>
>> It should be possible to port web libraries like Swazoo, WebClient,
>> Aida, Zinc, Altitude and others without problems. This does not
>> necessarily mean that the files appear 'nicely' in Cuis.
>>
>>
>> ## Thirdly
>>
>> Unicode support for display in Cuis. Probably still the major European
>> languages plus special symbols. More or less the state as is plus a
>> few more symbols/
>>
>>
>> ##  Fourthly
>>
>> Having a foundation that people who want to add additional support
>> (e.g. Korean, Russian, Japanese) can go ahead and implement
>> add-on-libraries.
>>
>>
>>
>> ##  Focus
>>
>> We should focus on 1) and 2) first.
>>
>> As of now I think I am on a good way to support this with an external
>> library. As for the changes in Cuis as such I think we should be
>> careful not to add too much complexity.
>> I prefer Cuis to remain lean.
>>
>>
>> ## CONCLUSION
>>
>> Let's us come to an agreement
>>
>> a) what the minimal support for Unicode should be.
>>
>> b) what is the task of Juan to add into Cuis and what should reside in
>> external libraries.
>>
>>
>> In the meantime I'll update the notes
>>
>>     https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md
>>
>> here describing what the current Unicode support is.
>>
>>
>> Kind regards
>> Hannes
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>>
>> -----
>> Se certificó que el correo no contiene virus.
>> Comprobada por AVG - www.avg.es
>> Versión: 2013.0.2897 / Base de datos de virus: 2639/6087 - Fecha de la
>> versión: 07/02/2013
>
>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Hannes Hirzel
Hello Juan

> On 2/8/13, Juan Vuletich <[hidden email]> wrote:

>> Unfortunately, this means I broke the examples at
>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .

I have updated the file UnicodeNotes.md

and I did a test class (attached) which shows how to read and write an
UTF8 file.

test5ReadWriteUtf8
       
        "see UnicodeNotes.md"
       
     "self new test5ReadWriteUtf8"
     | stream content byteArray byteArray2 |

        "read UTF8 Unicode file into internal string with NCRs"
        "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference"
       
        stream := (FileStream  fileNamed:  self class fileName) binary.
        byteArray := stream contentsOfEntireFile.
      content := String fromUtf8: byteArray.
        "NCRs were added to 'content' as needed"

        "write internal string back to UTF8 file with NCRs converted back to
UTF8 chars"
        stream := (FileStream  forceNewFileNamed:  self class fileName2) binary.
        stream nextPutAll: (content asUtf8: true).  "true means: convert NCRs
back to UTF8"
        stream close.

      "compare the two versions: what is in file 'fileName' with what
is in file 'fileName2'"
        stream := (FileStream  fileNamed:  self class fileName) binary.
        byteArray := stream contentsOfEntireFile.
        stream close.

        stream := (FileStream  fileNamed:  self class fileName2) binary.
        byteArray2 := stream contentsOfEntireFile.
        stream close.

        self assert: byteArray = byteArray2.


BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and variants'
UTF8 should all be uppercase.

As of now I can use Cuis 4.1-1590 as is for my work which includes
reading and writing UTF8 encoded text files (including HTML files). So
as far as I am concerned further extended Cuis Unicode support might
be put on the back burner for some time.

However it might still be worthwhile considering maintaining a
TextConverter and UTF8Converter class for compatibility and other
reasons. More on this later.

Thank you for the update

https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st

and

kind regards

Hannes Hirzel

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org

UnicodeTest4dot1dash1590.st (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Hannes Hirzel
P.S.

To read an UTF8 file I do in Cuis 4.1-1590

    String fromUtf8:
       (FileStream fileNamed: 'anUTF8file.txt') contentsOfEntireFile


To write an UTF8 file I do

    (FileStream forceNewFileNamed:  'aFileName.txt')
        binary;
        nextPutAll: ('abc àè€ &#945;&#946;&#947;' asUtf8: true);
        " 'true' means 'convert Numerical Character References back to UTF8'   "
        close.



On 2/13/13, H. Hirzel <[hidden email]> wrote:

> Hello Juan
>
>> On 2/8/13, Juan Vuletich <[hidden email]> wrote:
>
>>> Unfortunately, this means I broke the examples at
>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .
>
> I have updated the file UnicodeNotes.md
>
> and I did a test class (attached) which shows how to read and write an
> UTF8 file.
>
> test5ReadWriteUtf8
>
> "see UnicodeNotes.md"
>
>      "self new test5ReadWriteUtf8"
>      | stream content byteArray byteArray2 |
>
> "read UTF8 Unicode file into internal string with NCRs"
> "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference"
>
> stream := (FileStream  fileNamed:  self class fileName) binary.
> byteArray := stream contentsOfEntireFile.
>       content := String fromUtf8: byteArray.
> "NCRs were added to 'content' as needed"
>
> "write internal string back to UTF8 file with NCRs converted back to
> UTF8 chars"
> stream := (FileStream  forceNewFileNamed:  self class fileName2) binary.
> stream nextPutAll: (content asUtf8: true).  "true means: convert NCRs
> back to UTF8"
> stream close.
>
>       "compare the two versions: what is in file 'fileName' with what
> is in file 'fileName2'"
> stream := (FileStream  fileNamed:  self class fileName) binary.
> byteArray := stream contentsOfEntireFile.
> stream close.
>
> stream := (FileStream  fileNamed:  self class fileName2) binary.
> byteArray2 := stream contentsOfEntireFile.
> stream close.
>
> self assert: byteArray = byteArray2.
>
>
> BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and
> variants'
> UTF8 should all be uppercase.
>
> As of now I can use Cuis 4.1-1590 as is for my work which includes
> reading and writing UTF8 encoded text files (including HTML files). So
> as far as I am concerned further extended Cuis Unicode support might
> be put on the back burner for some time.
>
> However it might still be worthwhile considering maintaining a
> TextConverter and UTF8Converter class for compatibility and other
> reasons. More on this later.
>
> Thank you for the update
>
> https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st
>
> and
>
> kind regards
>
> Hannes Hirzel
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Juan Vuletich-4
In reply to this post by Hannes Hirzel
Hi Hannes,

Thanks for this. I added your new test.

BTW, in later changes, I tweaked a bit the protocol, adding a flag to skip a trailing null. This was needed for Windows clipboard. Can you update the examples again? BTW, some of the Cuis methods in UnicodeNotes.md are outdated as well...

Cheers,
Juan Vuletich

On 2/13/2013 2:46 PM, H. Hirzel wrote:
Hello Juan

On 2/8/13, Juan Vuletich [hidden email] wrote:

      
Unfortunately, this means I broke the examples at
https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .
I have updated the file UnicodeNotes.md

and I did a test class (attached) which shows how to read and write an
UTF8 file.

test5ReadWriteUtf8
	
	"see UnicodeNotes.md"
	
     "self new test5ReadWriteUtf8"
     | stream content byteArray byteArray2 |

	"read UTF8 Unicode file into internal string with NCRs"
	"for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference"
	
	stream := (FileStream  fileNamed:  self class fileName) binary.
	byteArray := stream contentsOfEntireFile.
      content := String fromUtf8: byteArray.
	"NCRs were added to 'content' as needed"

	"write internal string back to UTF8 file with NCRs converted back to
UTF8 chars"
	stream := (FileStream  forceNewFileNamed:  self class fileName2) binary.
	stream nextPutAll: (content asUtf8: true).  "true means: convert NCRs
back to UTF8"
	stream close.

      "compare the two versions: what is in file 'fileName' with what
is in file 'fileName2'"
	stream := (FileStream  fileNamed:  self class fileName) binary.
	byteArray := stream contentsOfEntireFile.
	stream close.

	stream := (FileStream  fileNamed:  self class fileName2) binary.
	byteArray2 := stream contentsOfEntireFile.
	stream close.

	self assert: byteArray = byteArray2.


BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and variants'
UTF8 should all be uppercase.

As of now I can use Cuis 4.1-1590 as is for my work which includes
reading and writing UTF8 encoded text files (including HTML files). So
as far as I am concerned further extended Cuis Unicode support might
be put on the back burner for some time.

However it might still be worthwhile considering maintaining a
TextConverter and UTF8Converter class for compatibility and other
reasons. More on this later.

Thank you for the update

https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st

and

kind regards

Hannes Hirzel
_______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Juan Vuletich-4
In reply to this post by Hannes Hirzel
Hi Hannes,

On 2/13/2013 2:46 PM, H. Hirzel wrote:
Hello Juan

On 2/8/13, Juan Vuletich [hidden email] wrote:

      
Unfortunately, this means I broke the examples at
https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .
I have updated the file UnicodeNotes.md

Thanks. Just a small correction. You say "Note: #utf8ToISO8859s15 is only used by the clipboard.", but that's no longer true, as now #fromUtf8: is used.

and I did a test class (attached) which shows how to read and write an
UTF8 file.

test5ReadWriteUtf8
	
	"see UnicodeNotes.md"
	
     "self new test5ReadWriteUtf8"
     | stream content byteArray byteArray2 |

	"read UTF8 Unicode file into internal string with NCRs"
	"for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference"
	
	stream := (FileStream  fileNamed:  self class fileName) binary.
	byteArray := stream contentsOfEntireFile.
      content := String fromUtf8: byteArray.
	"NCRs were added to 'content' as needed"

	"write internal string back to UTF8 file with NCRs converted back to
UTF8 chars"
	stream := (FileStream  forceNewFileNamed:  self class fileName2) binary.
	stream nextPutAll: (content asUtf8: true).  "true means: convert NCRs
back to UTF8"
	stream close.

      "compare the two versions: what is in file 'fileName' with what
is in file 'fileName2'"
	stream := (FileStream  fileNamed:  self class fileName) binary.
	byteArray := stream contentsOfEntireFile.
	stream close.

	stream := (FileStream  fileNamed:  self class fileName2) binary.
	byteArray2 := stream contentsOfEntireFile.
	stream close.

	self assert: byteArray = byteArray2.


BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and variants'
UTF8 should all be uppercase.

As of now I can use Cuis 4.1-1590 as is for my work which includes
reading and writing UTF8 encoded text files (including HTML files). So
as far as I am concerned further extended Cuis Unicode support might
be put on the back burner for some time.

However it might still be worthwhile considering maintaining a
TextConverter and UTF8Converter class for compatibility and other
reasons. More on this later.

Thank you for the update

https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st

and

kind regards

Hannes Hirzel
_______________________________________________ Cuis mailing list [hidden email] http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Hannes Hirzel
In reply to this post by Juan Vuletich-4
Hello Juan

On 2/21/13, Juan Vuletich <[hidden email]> wrote:
> Thanks for this. I added your new test.

Yes, thank you. I have seen
     StringTest new testAsUtf8
and
     StringTest new testAsUtf8WithNCRs

or is it something different?




> BTW, in later changes, I tweaked a bit the protocol, adding a flag to
> skip a trailing null. This was needed for Windows clipboard.

fromUtf8 is now


String>>
fromUtf8: aByteArray hex: useHexForNCRs trimLastNull: doTrimLastNullChar
        "Convert the given string from UTF-8 to  the internal encoding: ISO
Latin 9 (ISO 8859-15)"
        "For unicode chars not in ISO Latin 9 (ISO 8859-15), embed Decimal
NCRs or Hexadecimal NCRs according to useHex.
       
        See http://en.wikipedia.org/wiki/Numeric_character_reference
        See http://rishida.net/tools/conversion/. Tests prepared there.
       
        Note: The conversion of NCRs is reversible. See #asUtf8:
        This allows handling the full Unicode in Cuis tools, that can only
display the Latin alphabet, by editing the NCRs.
        The conversions can be done when reading / saving files, or when
pasting from Clipboard and storing back on it."



Can you
> update the examples again?

Which examples? I have added you as a collaborator for
https://github.com/hhzl/Cuis-Add-Ons
so that you can mark it directly through the github web interface.


BTW, some of the Cuis methods in
> UnicodeNotes.md are outdated as well...

OK, noted.

Regards
--Hannes


> On 2/13/2013 2:46 PM, H. Hirzel wrote:
>> Hello Juan
>>
>>> On 2/8/13, Juan Vuletich<[hidden email]>  wrote:
>>>> Unfortunately, this means I broke the examples at
>>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .
>> I have updated the file UnicodeNotes.md
>>
>> and I did a test class (attached) which shows how to read and write an
>> UTF8 file.
>>
>> test5ReadWriteUtf8
>>
>> "see UnicodeNotes.md"
>>
>>       "self new test5ReadWriteUtf8"
>>       | stream content byteArray byteArray2 |
>>
>> "read UTF8 Unicode file into internal string with NCRs"
>> "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference"
>>
>> stream := (FileStream  fileNamed:  self class fileName) binary.
>> byteArray := stream contentsOfEntireFile.
>>        content := String fromUtf8: byteArray.
>> "NCRs were added to 'content' as needed"
>>
>> "write internal string back to UTF8 file with NCRs converted back to
>> UTF8 chars"
>> stream := (FileStream  forceNewFileNamed:  self class fileName2) binary.
>> stream nextPutAll: (content asUtf8: true).  "true means: convert NCRs
>> back to UTF8"
>> stream close.
>>
>>        "compare the two versions: what is in file 'fileName' with what
>> is in file 'fileName2'"
>> stream := (FileStream  fileNamed:  self class fileName) binary.
>> byteArray := stream contentsOfEntireFile.
>> stream close.
>>
>> stream := (FileStream  fileNamed:  self class fileName2) binary.
>> byteArray2 := stream contentsOfEntireFile.
>> stream close.
>>
>> self assert: byteArray = byteArray2.
>>
>>
>> BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and
>> variants'
>> UTF8 should all be uppercase.
>>
>> As of now I can use Cuis 4.1-1590 as is for my work which includes
>> reading and writing UTF8 encoded text files (including HTML files). So
>> as far as I am concerned further extended Cuis Unicode support might
>> be put on the back burner for some time.
>>
>> However it might still be worthwhile considering maintaining a
>> TextConverter and UTF8Converter class for compatibility and other
>> reasons. More on this later.
>>
>> Thank you for the update
>>
>> https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st
>>
>> and
>>
>> kind regards
>>
>> Hannes Hirzel
>>
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Hannes Hirzel
In reply to this post by Juan Vuletich-4
On 2/21/13, Juan Vuletich <[hidden email]> wrote:

> Hi Hannes,
>
> On 2/13/2013 2:46 PM, H. Hirzel wrote:
>> Hello Juan
>>
>>> On 2/8/13, Juan Vuletich<[hidden email]>  wrote:
>>>> Unfortunately, this means I broke the examples at
>>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .
>> I have updated the file UnicodeNotes.md
>
> Thanks. Just a small correction. You say "Note: #utf8ToISO8859s15 is
> only used by the clipboard.", but that's no longer true, as now
> #fromUtf8: is used.

DONE

--HJH


>
>> and I did a test class (attached) which shows how to read and write an
>> UTF8 file.
>>
>> test5ReadWriteUtf8
>>
>> "see UnicodeNotes.md"
>>
>>       "self new test5ReadWriteUtf8"
>>       | stream content byteArray byteArray2 |
>>
>> "read UTF8 Unicode file into internal string with NCRs"
>> "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference"
>>
>> stream := (FileStream  fileNamed:  self class fileName) binary.
>> byteArray := stream contentsOfEntireFile.
>>        content := String fromUtf8: byteArray.
>> "NCRs were added to 'content' as needed"
>>
>> "write internal string back to UTF8 file with NCRs converted back to
>> UTF8 chars"
>> stream := (FileStream  forceNewFileNamed:  self class fileName2) binary.
>> stream nextPutAll: (content asUtf8: true).  "true means: convert NCRs
>> back to UTF8"
>> stream close.
>>
>>        "compare the two versions: what is in file 'fileName' with what
>> is in file 'fileName2'"
>> stream := (FileStream  fileNamed:  self class fileName) binary.
>> byteArray := stream contentsOfEntireFile.
>> stream close.
>>
>> stream := (FileStream  fileNamed:  self class fileName2) binary.
>> byteArray2 := stream contentsOfEntireFile.
>> stream close.
>>
>> self assert: byteArray = byteArray2.
>>
>>
>> BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and
>> variants'
>> UTF8 should all be uppercase.
>>
>> As of now I can use Cuis 4.1-1590 as is for my work which includes
>> reading and writing UTF8 encoded text files (including HTML files). So
>> as far as I am concerned further extended Cuis Unicode support might
>> be put on the back burner for some time.
>>
>> However it might still be worthwhile considering maintaining a
>> TextConverter and UTF8Converter class for compatibility and other
>> reasons. More on this later.
>>
>> Thank you for the update
>>
>> https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st
>>
>> and
>>
>> kind regards
>>
>> Hannes Hirzel
>>
>>
>> _______________________________________________
>> Cuis mailing list
>> [hidden email]
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>
>

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: (Minimal) requirements for Unicode support?

Juan Vuletich-4
In reply to this post by Hannes Hirzel
Hi Hannes,

I had trouble with my internet connection. Now, the updates are at
GitHub. Please check them. The test I added is your UnicodeTest, verbatim.

Cheers,
Juan Vuletich

On 2/21/2013 7:00 PM, H. Hirzel wrote:

> Hello Juan
>
> On 2/21/13, Juan Vuletich<[hidden email]>  wrote:
>> Thanks for this. I added your new test.
> Yes, thank you. I have seen
>       StringTest new testAsUtf8
> and
>       StringTest new testAsUtf8WithNCRs
>
> or is it something different?
>
>
>
>
>> BTW, in later changes, I tweaked a bit the protocol, adding a flag to
>> skip a trailing null. This was needed for Windows clipboard.
> fromUtf8 is now
>
>
> String>>
> fromUtf8: aByteArray hex: useHexForNCRs trimLastNull: doTrimLastNullChar
> "Convert the given string from UTF-8 to  the internal encoding: ISO
> Latin 9 (ISO 8859-15)"
> "For unicode chars not in ISO Latin 9 (ISO 8859-15), embed Decimal
> NCRs or Hexadecimal NCRs according to useHex.
>
> See http://en.wikipedia.org/wiki/Numeric_character_reference
> See http://rishida.net/tools/conversion/. Tests prepared there.
>
> Note: The conversion of NCRs is reversible. See #asUtf8:
> This allows handling the full Unicode in Cuis tools, that can only
> display the Latin alphabet, by editing the NCRs.
> The conversions can be done when reading / saving files, or when
> pasting from Clipboard and storing back on it."
>
>
>
> Can you
>> update the examples again?
> Which examples? I have added you as a collaborator for
> https://github.com/hhzl/Cuis-Add-Ons
> so that you can mark it directly through the github web interface.
>
>
> BTW, some of the Cuis methods in
>> UnicodeNotes.md are outdated as well...
> OK, noted.
>
> Regards
> --Hannes
>
>
>> On 2/13/2013 2:46 PM, H. Hirzel wrote:
>>> Hello Juan
>>>
>>>> On 2/8/13, Juan Vuletich<[hidden email]>   wrote:
>>>>> Unfortunately, this means I broke the examples at
>>>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md .
>>> I have updated the file UnicodeNotes.md
>>>
>>> and I did a test class (attached) which shows how to read and write an
>>> UTF8 file.
>>>
>>> test5ReadWriteUtf8
>>>
>>> "see UnicodeNotes.md"
>>>
>>>        "self new test5ReadWriteUtf8"
>>>        | stream content byteArray byteArray2 |
>>>
>>> "read UTF8 Unicode file into internal string with NCRs"
>>> "for NCR see http://en.wikipedia.org/wiki/Numeric_character_reference"
>>>
>>> stream := (FileStream  fileNamed:  self class fileName) binary.
>>> byteArray := stream contentsOfEntireFile.
>>>         content := String fromUtf8: byteArray.
>>> "NCRs were added to 'content' as needed"
>>>
>>> "write internal string back to UTF8 file with NCRs converted back to
>>> UTF8 chars"
>>> stream := (FileStream  forceNewFileNamed:  self class fileName2) binary.
>>> stream nextPutAll: (content asUtf8: true).  "true means: convert NCRs
>>> back to UTF8"
>>> stream close.
>>>
>>>         "compare the two versions: what is in file 'fileName' with what
>>> is in file 'fileName2'"
>>> stream := (FileStream  fileNamed:  self class fileName) binary.
>>> byteArray := stream contentsOfEntireFile.
>>> stream close.
>>>
>>> stream := (FileStream  fileNamed:  self class fileName2) binary.
>>> byteArray2 := stream contentsOfEntireFile.
>>> stream close.
>>>
>>> self assert: byteArray = byteArray2.
>>>
>>>
>>> BTW according to http://en.wikipedia.org/wiki/UTF8 'Official name and
>>> variants'
>>> UTF8 should all be uppercase.
>>>
>>> As of now I can use Cuis 4.1-1590 as is for my work which includes
>>> reading and writing UTF8 encoded text files (including HTML files). So
>>> as far as I am concerned further extended Cuis Unicode support might
>>> be put on the back burner for some time.
>>>
>>> However it might still be worthwhile considering maintaining a
>>> TextConverter and UTF8Converter class for compatibility and other
>>> reasons. More on this later.
>>>
>>> Thank you for the update
>>>
>>> https://github.com/jvuletich/Cuis/blob/master/UpdatesSinceLastRelease/1590-InvertibleUTF8Conversion-JuanVuletich-2013Feb08-08h11m-jmv.1.cs.st
>>>
>>> and
>>>
>>> kind regards
>>>
>>> Hannes Hirzel
>>>
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> [hidden email]
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
> _______________________________________________
> Cuis mailing list
> [hidden email]
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org