What is with the UTF7 encoder?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

What is with the UTF7 encoder?

Terry Raymond

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Kogan, Tamara

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Boris Popov, DeepCove Labs (SNN)
In reply to this post by Terry Raymond

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Kogan, Tamara

I guess because since 1999 the method String class>>fromIntegerArray: encoding: has been implemented to decode in to ByteString not ByteArray.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Boris Popov, DeepCove Labs [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 9:20 AM
To: Kogan, Tamara; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Boris Popov, DeepCove Labs (SNN)

('балалайка' asByteArrayEncoding: #utf8) asStringEncoding: #utf8 => 'балалайка'

 

Why not UTF7?

 

-Boris

 

From: Kogan, Tamara [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 9:49 AM
To: Boris Popov, DeepCove Labs; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

I guess because since 1999 the method String class>>fromIntegerArray: encoding: has been implemented to decode in to ByteString not ByteArray.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Boris Popov, DeepCove Labs [[hidden email]]
Sent: Tuesday, March 05, 2013 9:20 AM
To: Kogan, Tamara; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Terry Raymond
In reply to this post by Kogan, Tamara

Tamara

 

You should try;

('数据'  asByteArrayEncoding: #utf_7)

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Terry Raymond
In reply to this post by Kogan, Tamara

Tamara

 

Your tests should also cover non-ascii characters.

Look at  http://msdn.microsoft.com/en-us/library/system.text.encoding.utf7.aspx

and see if you can produce the same results using their test case.

 

Also, try a test using Chinese characters.

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Kogan, Tamara

Terry,

 

Russian 'балалайка' has already killed our UTF7 encoder J

I see the problem. I am wondering why do you need UTF7. Can you use UTF8 instead?

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Terry Raymond [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 9:56 AM
To: Kogan, Tamara; 'VWNC'
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Tamara

 

Your tests should also cover non-ascii characters.

Look at  http://msdn.microsoft.com/en-us/library/system.text.encoding.utf7.aspx

and see if you can produce the same results using their test case.

 

Also, try a test using Chinese characters.

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [[hidden email]]
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Terry Raymond
In reply to this post by Kogan, Tamara

Tamara

 

These are the bugs I found in UTF7StreamEncoder

1. UTF7StreamEnoder>>nextFrom:  answers the numerical code, whereas the class protocol expects it to answer the character.

2. UTF7StreamEncoder>>isNotASCII:  treats character codes greater than 255 as ASCII, that does not work.

3. UTF7StremEncoder>>fillNibbleFrom:

  the line

                (aStream atEnd or: [aStream peek == self shiftOutCode]) ifTrue: [ ^nSextets ].

  should be changed so it consumes the shiftOutCode before returning.

                (aStream atEnd or: [aStream peek == self shiftOutCode]) ifTrue: [ aStream next. ^nSextets ].

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 9:49 AM
To: Boris Popov, DeepCove Labs; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

I guess because since 1999 the method String class>>fromIntegerArray: encoding: has been implemented to decode in to ByteString not ByteArray.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Boris Popov, DeepCove Labs [[hidden email]]
Sent: Tuesday, March 05, 2013 9:20 AM
To: Kogan, Tamara; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Kogan, Tamara

Terry,

 

UTF8 encoder was reworked completely since it became the main internet encoding.  Since no one complained (or used )  utf7 encoding the encoder was neglected.

I will create an AR but it would be interesting to know why do you need utf7. It will help to define the AR priority.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Terry Raymond [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 10:26 AM
To: Kogan, Tamara; 'Boris Popov, DeepCove Labs'; 'VWNC'
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Tamara

 

These are the bugs I found in UTF7StreamEncoder

1. UTF7StreamEnoder>>nextFrom:  answers the numerical code, whereas the class protocol expects it to answer the character.

2. UTF7StreamEncoder>>isNotASCII:  treats character codes greater than 255 as ASCII, that does not work.

3. UTF7StremEncoder>>fillNibbleFrom:

  the line

                (aStream atEnd or: [aStream peek == self shiftOutCode]) ifTrue: [ ^nSextets ].

  should be changed so it consumes the shiftOutCode before returning.

                (aStream atEnd or: [aStream peek == self shiftOutCode]) ifTrue: [ aStream next. ^nSextets ].

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [[hidden email]]
Sent: Tuesday, March 05, 2013 9:49 AM
To: Boris Popov, DeepCove Labs; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

I guess because since 1999 the method String class>>fromIntegerArray: encoding: has been implemented to decode in to ByteString not ByteArray.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Boris Popov, DeepCove Labs [[hidden email]]
Sent: Tuesday, March 05, 2013 9:20 AM
To: Kogan, Tamara; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Terry Raymond

We deal with data from many sources, one of the sources has UTF-7 encoded

data.

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 10:39 AM
To: Terry Raymond; Boris Popov, DeepCove Labs; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Terry,

 

UTF8 encoder was reworked completely since it became the main internet encoding.  Since no one complained (or used )  utf7 encoding the encoder was neglected.

I will create an AR but it would be interesting to know why do you need utf7. It will help to define the AR priority.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Terry Raymond [[hidden email]]
Sent: Tuesday, March 05, 2013 10:26 AM
To: Kogan, Tamara; 'Boris Popov, DeepCove Labs'; 'VWNC'
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Tamara

 

These are the bugs I found in UTF7StreamEncoder

1. UTF7StreamEnoder>>nextFrom:  answers the numerical code, whereas the class protocol expects it to answer the character.

2. UTF7StreamEncoder>>isNotASCII:  treats character codes greater than 255 as ASCII, that does not work.

3. UTF7StremEncoder>>fillNibbleFrom:

  the line

                (aStream atEnd or: [aStream peek == self shiftOutCode]) ifTrue: [ ^nSextets ].

  should be changed so it consumes the shiftOutCode before returning.

                (aStream atEnd or: [aStream peek == self shiftOutCode]) ifTrue: [ aStream next. ^nSextets ].

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [[hidden email]]
Sent: Tuesday, March 05, 2013 9:49 AM
To: Boris Popov, DeepCove Labs; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

I guess because since 1999 the method String class>>fromIntegerArray: encoding: has been implemented to decode in to ByteString not ByteArray.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Boris Popov, DeepCove Labs [[hidden email]]
Sent: Tuesday, March 05, 2013 9:20 AM
To: Kogan, Tamara; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Georg Heeg
In reply to this post by Boris Popov, DeepCove Labs (SNN)

Terry and Boris,

 

you are perfectly right.

 

There are actually quite a few bugs in UTF7StreamEncoder.

 

1.      Any character > 255 are not encoded correctly, e.g. $€, an exception is thrown during encoding [I do not have a fix for that]

2.      To fix your problem UTF7StreamEncoder>nextFrom: must be fixed. Here is the fix which works for me:

nextFrom: aStream

               "Decode the next byte(s) in the stream and answer the code."

 

               | code c1 |

               aStream peek isNil ifTrue: [^nil].

               code := aStream peek.

               code = self shiftInCode

                             ifTrue:

                                            [code := aStream next.

                                            (code = self shiftInCode and: [aStream peek = self shiftOutCode])

                                                           ifTrue:

                                                                          [aStream next.

                                                                          ^code asCharacter].

                                            self shifting: true].

               (code = self shiftOutCode and: [self shifting])

                             ifTrue:

                                            [(c1 := self shiftInNextFrom: aStream) notNil ifTrue: [^c1 asCharacter].

                                            self shifting: false.

                                            aStream next.

                                            aStream peek isNil ifTrue: [^nil]].

               ^(self shifting

                             ifTrue: [self shiftInNextFrom: aStream]

                             ifFalse: [aStream next]) asCharacter

 

3.      If the string ends with a non ASCII character like $ö another exception is thrown. Here is the fix of UTF7StreamEncoder> shiftInNextFrom: which works for me:

shiftInNextFrom: aStream

 

               | c1 c2 |

               c1 := super nextFrom: aStream.

               c1 == nil ifTrue: [^nil].

               c2 := super nextFrom: aStream.

               c2 == nil ifTrue: [^nil].

               aStream peek = self shiftOutCode

                             ifTrue:

                                            [aStream next.

                                            self shifting: false].

               ^(c1 bitShift: 8) + c2

 

 

Have fun

 

Georg

 

Georg Heeg eK, Dortmund und Köthen, HR Dortmund A 12812

Wallstraße 22, 06366 Köthen

Tel. +49-3496-214328, Fax +49-3496-214712

 

Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Boris Popov, DeepCove Labs
Gesendet: Dienstag, 5. März 2013 15:20
An: Kogan, Tamara; Terry Raymond; VWNC
Betreff: Re: [vwnc] What is with the UTF7 encoder?

 

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Terry Raymond

Georg

 

1.       change isNotASCII from

isNotASCII: code

                ^(code < 31 or: [ code > 127 and: [ code <255]])

to:

isNotASCII: code

                ^(code < 31 or: [ code > 127])

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Georg Heeg [mailto:[hidden email]]
Sent: Tuesday, March 05, 2013 12:06 PM
To: 'Boris Popov, DeepCove Labs'; 'Kogan, Tamara'; 'Terry Raymond'; 'VWNC'
Subject: AW: [vwnc] What is with the UTF7 encoder?

 

Terry and Boris,

 

you are perfectly right.

 

There are actually quite a few bugs in UTF7StreamEncoder.

 

1.       Any character > 255 are not encoded correctly, e.g. $€, an exception is thrown during encoding [I do not have a fix for that]

2.       To fix your problem UTF7StreamEncoder>nextFrom: must be fixed. Here is the fix which works for me:

nextFrom: aStream

               "Decode the next byte(s) in the stream and answer the code."

 

               | code c1 |

               aStream peek isNil ifTrue: [^nil].

               code := aStream peek.

               code = self shiftInCode

                             ifTrue:

                                            [code := aStream next.

                                            (code = self shiftInCode and: [aStream peek = self shiftOutCode])

                                                           ifTrue:

                                                                          [aStream next.

                                                                          ^code asCharacter].

                                            self shifting: true].

               (code = self shiftOutCode and: [self shifting])

                             ifTrue:

                                            [(c1 := self shiftInNextFrom: aStream) notNil ifTrue: [^c1 asCharacter].

                                            self shifting: false.

                                            aStream next.

                                            aStream peek isNil ifTrue: [^nil]].

               ^(self shifting

                             ifTrue: [self shiftInNextFrom: aStream]

                             ifFalse: [aStream next]) asCharacter

 

3.       If the string ends with a non ASCII character like $ö another exception is thrown. Here is the fix of UTF7StreamEncoder> shiftInNextFrom: which works for me:

shiftInNextFrom: aStream

 

               | c1 c2 |

               c1 := super nextFrom: aStream.

               c1 == nil ifTrue: [^nil].

               c2 := super nextFrom: aStream.

               c2 == nil ifTrue: [^nil].

               aStream peek = self shiftOutCode

                             ifTrue:

                                            [aStream next.

                                            self shifting: false].

               ^(c1 bitShift: 8) + c2

 

 

Have fun

 

Georg

 

Georg Heeg eK, Dortmund und Köthen, HR Dortmund A 12812

Wallstraße 22, 06366 Köthen

Tel. +49-3496-214328, Fax +49-3496-214712

 

Von: [hidden email] [[hidden email]] Im Auftrag von Boris Popov, DeepCove Labs
Gesendet: Dienstag, 5. März 2013 15:20
An: Kogan, Tamara; Terry Raymond; VWNC
Betreff: Re: [vwnc] What is with the UTF7 encoder?

 

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: What is with the UTF7 encoder?

Georg Heeg
In reply to this post by Terry Raymond

I saw that AR 67634 (Modified) (M) UTF7 Encoder does not work properly has been created.

 

Georg

 

Georg Heeg eK, Dortmund und Köthen, HR Dortmund A 12812

Wallstraße 22, 06366 Köthen

Tel. +49-3496-214328, Fax +49-3496-214712

 

Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Terry Raymond
Gesendet: Dienstag, 5. März 2013 17:07
An: 'Kogan, Tamara'; 'Boris Popov, DeepCove Labs'; 'VWNC'
Betreff: Re: [vwnc] What is with the UTF7 encoder?

 

We deal with data from many sources, one of the sources has UTF-7 encoded

data.

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [[hidden email]]
Sent: Tuesday, March 05, 2013 10:39 AM
To: Terry Raymond; Boris Popov, DeepCove Labs; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Terry,

 

UTF8 encoder was reworked completely since it became the main internet encoding.  Since no one complained (or used )  utf7 encoding the encoder was neglected.

I will create an AR but it would be interesting to know why do you need utf7. It will help to define the AR priority.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Terry Raymond [[hidden email]]
Sent: Tuesday, March 05, 2013 10:26 AM
To: Kogan, Tamara; 'Boris Popov, DeepCove Labs'; 'VWNC'
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

Tamara

 

These are the bugs I found in UTF7StreamEncoder

1. UTF7StreamEnoder>>nextFrom:  answers the numerical code, whereas the class protocol expects it to answer the character.

2. UTF7StreamEncoder>>isNotASCII:  treats character codes greater than 255 as ASCII, that does not work.

3. UTF7StremEncoder>>fillNibbleFrom:

  the line

                (aStream atEnd or: [aStream peek == self shiftOutCode]) ifTrue: [ ^nSextets ].

  should be changed so it consumes the shiftOutCode before returning.

                (aStream atEnd or: [aStream peek == self shiftOutCode]) ifTrue: [ aStream next. ^nSextets ].

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 

From: Kogan, Tamara [[hidden email]]
Sent: Tuesday, March 05, 2013 9:49 AM
To: Boris Popov, DeepCove Labs; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

I guess because since 1999 the method String class>>fromIntegerArray: encoding: has been implemented to decode in to ByteString not ByteArray.

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: Boris Popov, DeepCove Labs [[hidden email]]
Sent: Tuesday, March 05, 2013 9:20 AM
To: Kogan, Tamara; Terry Raymond; VWNC
Subject: RE: [vwnc] What is with the UTF7 encoder?

 

That doesn’t quite answer the question of why Terry can’t just go to bytes and back with an identical encoder?

 

-Boris

 

From: [hidden email] [[hidden email]] On Behalf Of Kogan, Tamara
Sent: Tuesday, March 05, 2013 9:18 AM
To: Terry Raymond; VWNC
Subject: Re: [vwnc] What is with the UTF7 encoder?

 

Here is an example from our SUnit tests:

                str := ('testöäü' asByteArrayEncoding: #'utf_7') asByteString.

                stream := (ByteArray new withEncoding: #UTF_7) writeStream.

                stream nextPutAll: 'testöäü'.

                stream close.                                                                                    

                self assert: str = (stream encodedContents withEncoding: #ASCII) readStream contents

 

Tamara Kogan

Smalltalk Development

Cincom Systems

 

From: [hidden email] [[hidden email]] On Behalf Of Terry Raymond
Sent: Tuesday, March 05, 2013 9:07 AM
To: VWNC
Subject: [vwnc] What is with the UTF7 encoder?

 

I need a utf 7 encoder/decoder, so I loaded NetClientBase, tried the following

and got an exception;

 

('some text'  asByteArrayEncoding: #utf_7) asStringEncoding: #utf_7

 

Terry

 

===========================================================

Terry Raymond

Crafted Smalltalk

80 Lazywood Ln.

Tiverton, RI  02878

(401) 624-4517      [hidden email]

===========================================================

 


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc