Smalltalk › Squeak › Squeak - Dev

The Trunk: Multilingual-tonyg.236.mcz

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

commits-2

The Trunk: Multilingual-tonyg.236.mcz

David T. Lewis uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-tonyg.236.mcz

==================== Summary ====================

Name: Multilingual-tonyg.236
Author: tonyg
Time: 31 January 2018, 11:19:22.844612 pm
UUID: 62b136a5-9964-42d9-9397-2a6aa303f339
Ancestors: Multilingual-tonyg.235

Properly report short sequences as InvalidUTF8 rather than out-of-bounds subscript. Fixes a failing UTF8EdgeCaseTest>>testSequencesWithLastContinuationByteMissing.

=============== Diff against Multilingual-tonyg.235 ===============

Item was changed:
----- Method: UTF8TextConverter class>>decodeByteString: (in category 'conversion') -----
decodeByteString: aByteString
"Convert the given string from UTF-8 using the fast path if converting to Latin-1"

+ | outStream lastIndex nextIndex limit byte1 byte2 byte3 byte4 unicode |
- | outStream lastIndex nextIndex byte1 byte2 byte3 byte4 unicode |
lastIndex := 1.
(nextIndex := ByteString findFirstInString: aByteString inSet: latin1Map startingAt: lastIndex) = 0
ifTrue: [ ^aByteString ].
+ limit := aByteString size.
+ outStream := (String new: limit) writeStream.
- outStream := (String new: aByteString size) writeStream.
[
outStream next: nextIndex - lastIndex putAll: aByteString startingAt: lastIndex.
byte1 := aByteString byteAt: nextIndex.
(byte1 bitAnd: 16rE0) = 192 ifTrue: [ "two bytes"
+ nextIndex < limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
unicode := ((byte1 bitAnd: 31) bitShift: 6) + (byte2 bitAnd: 63)].
(byte1 bitAnd: 16rF0) = 224 ifTrue: [ "three bytes"
+ (nextIndex + 2) <= limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
byte3 := aByteString byteAt: (nextIndex := nextIndex + 1).
(byte3 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
unicode := ((byte1 bitAnd: 15) bitShift: 12) + ((byte2 bitAnd: 63) bitShift: 6)
+ (byte3 bitAnd: 63)].
(byte1 bitAnd: 16rF8) = 240 ifTrue: [ "four bytes"
+ (nextIndex + 3) <= limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
byte3 := aByteString byteAt: (nextIndex := nextIndex + 1).
(byte3 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
byte4 := aByteString byteAt: (nextIndex := nextIndex + 1).
(byte4 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
unicode := ((byte1 bitAnd: 16r7) bitShift: 18) +
((byte2 bitAnd: 63) bitShift: 12) +
((byte3 bitAnd: 63) bitShift: 6) +
(byte4 bitAnd: 63)].
unicode ifNil: [ ^self errorMalformedInput: aByteString ].
unicode = 16rFEFF ifFalse: [ "Skip byte order mark"
outStream nextPut: (Unicode value: unicode) ].
lastIndex := nextIndex + 1.
(nextIndex := ByteString findFirstInString: aByteString inSet: latin1Map startingAt: lastIndex) = 0 ] whileFalse.
^outStream
next: aByteString size - lastIndex + 1 putAll: aByteString startingAt: lastIndex;
contents
!