Status: New
Owner: ----
New issue 3360 by sven.van.caekenberghe: TextConverter handling of binary
streams is wrong
http://code.google.com/p/pharo/issues/detail?id=3360It seems that the way binary (#isBinary true) streams are handled by
TextConverter and its subclasses is wrong. When given a binary stream, the
core text converter methods (#nextPut:toStream and #nextFromStream:) simply
do no longer encode or decode at all.
Moreover, the unit test UTF8TextConverter>>#testPutSingleCharacter seems
plain wrong. The actual encoded bytes should be #[97 226 130 172].
However, this behavior seems to be added by design, so it is hard to
estimate the impact of changing this.
It is currently very ugly to get a binary UTF-8 encoding, one has to write
to a character stream and then turn those characters into bytes.
I wrote an alternative UTF-8 encoder as a support class to the Zinc HTTP
Components (
http://www.squeaksource.com/ZincHTTPComponents.html) together
with the following unit test:
testUTF8Encoder
"The examples are taken from
http://en.wikipedia.org/wiki/UTF-8#Description"
| encoder inputBytes outputBytes inputString outputString |
encoder := ZnUTF8Encoder new.
inputString := String with: $$ with: (Unicode value: 16r00A2) with:
(Unicode value: 16r20AC) with: (Unicode value: 16r024B62).
inputBytes := #[16r24 16rC2 16rA2 16rE2 16r82 16rAC 16rF0 16rA4 16rAD
16rA2].
outputBytes := self encodeString: inputString with: encoder.
self assert: outputBytes = inputBytes.
outputString := self decodeBytes: inputBytes with: encoder.
self assert: outputString = inputString
based on the helper methods:
encodeString: string with: encoder
^ ByteArray streamContents: [ :stream |
string do: [ :each |
encoder nextPut: each toStream: stream ] ]
decodeBytes: bytes with: encoder
| input |
input := bytes readStream.
^ String streamContents: [ :stream |
[ input atEnd ] whileFalse: [
stream nextPut: (encoder nextFromStream: input) ] ]
The new encoder code is simpler, but might not handle everything that is
needed (leading chars, language codes), but is all that still needed ?
Sven