Smalltalk › Cincom › VisualWorks

VW 7.9.1 UTF-8 decoder error

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

Richard Sargent (again)

VW 7.9.1 UTF-8 decoder error

I had a failing test that was reading an 8-bit file-out. The test reads the file as UTF-8 encoded, and the 8-bit characters cause the UTF-8 decoding to produce an incorrect result.

The file contains the character 16rF3, which triggers the 21-bit UTF-8 decoding. This expects 4 bytes looking like

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

The characters following the F3 are normal ASCII, which is an error.

RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."^[14] The Unicode Standard requires decoders to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence."

(from Wikipedia's UTF-8 article)

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc