The Trunk: Collections-ul.284.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-ul.284.mcz

commits-2
Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.284.mcz

==================== Summary ====================

Name: Collections-ul.284
Author: ul
Time: 29 January 2010, 1:37:36.08 am
UUID: 855b55a0-8aa2-ed4a-8612-d1153fa8f656
Ancestors: Collections-ul.282

- fix: ignore Byte Order Mark in ByteString >> #utf8ToSqueak

=============== Diff against Collections-ul.282 ===============

Item was changed:
  ----- Method: ByteString>>utf8ToSqueak (in category 'converting') -----
  utf8ToSqueak
  "Convert the given string from UTF-8 using the fast path if converting to Latin-1"
  | outStream lastIndex nextIndex byte1 byte2 byte3 byte4 unicode |
  Latin1ToUtf8Map ifNil:[^super utf8ToSqueak]. "installation guard"
  lastIndex := 1.
  nextIndex := ByteString findFirstInString: self inSet: Latin1ToUtf8Map startingAt: lastIndex.
  nextIndex = 0 ifTrue:[^self].
  outStream := (String new: self size) writeStream.
  [outStream next: nextIndex-lastIndex putAll: self startingAt: lastIndex.
  byte1 := self byteAt: nextIndex.
  (byte1 bitAnd: 16rE0) = 192 ifTrue: [ "two bytes"
  byte2 := self byteAt: (nextIndex := nextIndex+1).
  (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  unicode := ((byte1 bitAnd: 31) bitShift: 6) + (byte2 bitAnd: 63)].
  (byte1 bitAnd: 16rF0) = 224 ifTrue: [ "three bytes"
  byte2 := self byteAt: (nextIndex := nextIndex+1).
  (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  byte3 := self byteAt: (nextIndex := nextIndex+1).
  (byte3 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  unicode := ((byte1 bitAnd: 15) bitShift: 12) + ((byte2 bitAnd: 63) bitShift: 6)
  + (byte3 bitAnd: 63)].
  (byte1 bitAnd: 16rF8) = 240 ifTrue: [ "four bytes"
  byte2 := self byteAt: (nextIndex := nextIndex+1).
  (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  byte3 := self byteAt: (nextIndex := nextIndex+1).
  (byte3 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  byte4 := self byteAt: (nextIndex := nextIndex+1).
  (byte4 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  unicode := ((byte1 bitAnd: 16r7) bitShift: 18) +
  ((byte2 bitAnd: 63) bitShift: 12) +
  ((byte3 bitAnd: 63) bitShift: 6) +
  (byte4 bitAnd: 63)].
  unicode ifNil:[^self]. "invalid UTF-8; presume Latin-1"
+ unicode = 16rFEFF ifFalse: [ "Skip byte order mark"
+ outStream nextPut: (Unicode value: unicode) ].
- outStream nextPut: (Unicode value: unicode).
  lastIndex := nextIndex + 1.
  nextIndex := ByteString findFirstInString: self inSet: Latin1ToUtf8Map startingAt: lastIndex.
  nextIndex = 0] whileFalse.
  outStream next: self size-lastIndex+1 putAll: self startingAt: lastIndex.
  ^outStream contents
  !