Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.284.mcz ==================== Summary ==================== Name: Collections-ul.284 Author: ul Time: 29 January 2010, 1:37:36.08 am UUID: 855b55a0-8aa2-ed4a-8612-d1153fa8f656 Ancestors: Collections-ul.282 - fix: ignore Byte Order Mark in ByteString >> #utf8ToSqueak =============== Diff against Collections-ul.282 =============== Item was changed: ----- Method: ByteString>>utf8ToSqueak (in category 'converting') ----- utf8ToSqueak "Convert the given string from UTF-8 using the fast path if converting to Latin-1" | outStream lastIndex nextIndex byte1 byte2 byte3 byte4 unicode | Latin1ToUtf8Map ifNil:[^super utf8ToSqueak]. "installation guard" lastIndex := 1. nextIndex := ByteString findFirstInString: self inSet: Latin1ToUtf8Map startingAt: lastIndex. nextIndex = 0 ifTrue:[^self]. outStream := (String new: self size) writeStream. [outStream next: nextIndex-lastIndex putAll: self startingAt: lastIndex. byte1 := self byteAt: nextIndex. (byte1 bitAnd: 16rE0) = 192 ifTrue: [ "two bytes" byte2 := self byteAt: (nextIndex := nextIndex+1). (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1" unicode := ((byte1 bitAnd: 31) bitShift: 6) + (byte2 bitAnd: 63)]. (byte1 bitAnd: 16rF0) = 224 ifTrue: [ "three bytes" byte2 := self byteAt: (nextIndex := nextIndex+1). (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1" byte3 := self byteAt: (nextIndex := nextIndex+1). (byte3 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1" unicode := ((byte1 bitAnd: 15) bitShift: 12) + ((byte2 bitAnd: 63) bitShift: 6) + (byte3 bitAnd: 63)]. (byte1 bitAnd: 16rF8) = 240 ifTrue: [ "four bytes" byte2 := self byteAt: (nextIndex := nextIndex+1). (byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1" byte3 := self byteAt: (nextIndex := nextIndex+1). (byte3 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1" byte4 := self byteAt: (nextIndex := nextIndex+1). (byte4 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1" unicode := ((byte1 bitAnd: 16r7) bitShift: 18) + ((byte2 bitAnd: 63) bitShift: 12) + ((byte3 bitAnd: 63) bitShift: 6) + (byte4 bitAnd: 63)]. unicode ifNil:[^self]. "invalid UTF-8; presume Latin-1" + unicode = 16rFEFF ifFalse: [ "Skip byte order mark" + outStream nextPut: (Unicode value: unicode) ]. - outStream nextPut: (Unicode value: unicode). lastIndex := nextIndex + 1. nextIndex := ByteString findFirstInString: self inSet: Latin1ToUtf8Map startingAt: lastIndex. nextIndex = 0] whileFalse. outStream next: self size-lastIndex+1 putAll: self startingAt: lastIndex. ^outStream contents ! |
Free forum by Nabble | Edit this page |