VW 7.9.1 UTF-8 decoder error

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

VW 7.9.1 UTF-8 decoder error

Richard Sargent (again)
I had a failing test that was reading an 8-bit file-out. The test reads the file as UTF-8 encoded, and the 8-bit characters cause the UTF-8 decoding to produce an incorrect result.

The file contains the character 16rF3, which triggers the 21-bit UTF-8 decoding. This expects 4 bytes looking like
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

The characters following the F3 are normal ASCII, which is an error.

RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."[14] The Unicode Standard requires decoders to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence."
(from Wikipedia's UTF-8 article)

vwnc mailing list
[hidden email]