The Inbox: Multilingual-tonyg.236.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Multilingual-tonyg.236.mcz

commits-2
A new version of Multilingual was added to project The Inbox:
http://source.squeak.org/inbox/Multilingual-tonyg.236.mcz

==================== Summary ====================

Name: Multilingual-tonyg.236
Author: tonyg
Time: 31 January 2018, 11:19:22.844612 pm
UUID: 62b136a5-9964-42d9-9397-2a6aa303f339
Ancestors: Multilingual-tonyg.235

Properly report short sequences as InvalidUTF8 rather than out-of-bounds subscript. Fixes a failing UTF8EdgeCaseTest>>testSequencesWithLastContinuationByteMissing.

=============== Diff against Multilingual-tonyg.235 ===============

Item was changed:
  ----- Method: UTF8TextConverter class>>decodeByteString: (in category 'conversion') -----
  decodeByteString: aByteString
  "Convert the given string from UTF-8 using the fast path if converting to Latin-1"
 
+ | outStream lastIndex nextIndex limit byte1 byte2 byte3 byte4 unicode |
- | outStream lastIndex nextIndex byte1 byte2 byte3 byte4 unicode |
  lastIndex := 1.
  (nextIndex := ByteString findFirstInString: aByteString inSet: latin1Map startingAt: lastIndex) = 0
  ifTrue: [ ^aByteString ].
+ limit := aByteString size.
+ outStream := (String new: limit) writeStream.
- outStream := (String new: aByteString size) writeStream.
  [
  outStream next: nextIndex - lastIndex putAll: aByteString startingAt: lastIndex.
  byte1 := aByteString byteAt: nextIndex.
  (byte1 bitAnd: 16rE0) = 192 ifTrue: [ "two bytes"
+ nextIndex < limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
  byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
  (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  unicode := ((byte1 bitAnd: 31) bitShift: 6) + (byte2 bitAnd: 63)].
  (byte1 bitAnd: 16rF0) = 224 ifTrue: [ "three bytes"
+ (nextIndex + 2) <= limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
  byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
  (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  byte3 := aByteString byteAt: (nextIndex := nextIndex + 1).
  (byte3 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  unicode := ((byte1 bitAnd: 15) bitShift: 12) + ((byte2 bitAnd: 63) bitShift: 6)
  + (byte3 bitAnd: 63)].
  (byte1 bitAnd: 16rF8) = 240 ifTrue: [ "four bytes"
+ (nextIndex + 3) <= limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
  byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
  (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  byte3 := aByteString byteAt: (nextIndex := nextIndex + 1).
  (byte3 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  byte4 := aByteString byteAt: (nextIndex := nextIndex + 1).
  (byte4 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  unicode := ((byte1 bitAnd: 16r7) bitShift: 18) +
  ((byte2 bitAnd: 63) bitShift: 12) +
  ((byte3 bitAnd: 63) bitShift: 6) +
  (byte4 bitAnd: 63)].
  unicode ifNil: [ ^self errorMalformedInput: aByteString ].
  unicode = 16rFEFF ifFalse: [ "Skip byte order mark"
  outStream nextPut: (Unicode value: unicode) ].
  lastIndex := nextIndex + 1.
  (nextIndex := ByteString findFirstInString: aByteString inSet: latin1Map startingAt: lastIndex) = 0 ] whileFalse.
  ^outStream
  next: aByteString size - lastIndex + 1 putAll: aByteString startingAt: lastIndex;
  contents
  !