Andreas Raab uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-ul.85.mcz ==================== Summary ==================== Name: Multilingual-ul.85 Author: ul Time: 6 February 2010, 12:37:47.216 am UUID: 47b3a790-8f3b-6341-abf3-21bf95892821 Ancestors: Multilingual-nice.84 - fix #basicNext: and #basicUpTo: in MultiByteFileStream - add chunk reading capabilities to TextConverter - assume that MultiByteFileStream's converter is properly initialized in #next - MultiByteFileStream >> #nextChunk uses its converter's chunk reading capabilities, this speeds gives >3x speedup if the file has UTF-8 encoding - fix: MultiByteFileStream lost it's position if the ! character was encoded to more than a single byte (ex UTF16) =============== Diff against Multilingual-nice.84 =============== Item was changed: ----- Method: MultiByteFileStream>>nextChunk (in category 'fileIn/Out') ----- nextChunk "Answer the contents of the receiver, up to the next terminator character. Doubled terminators indicate an embedded terminator character." + ^converter nextChunkFromStream: self! - self skipSeparators. - ^self parseLangTagFor: ( - String new: 1000 streamContents: [ :stream | - | character | - [ - (character := self next) == nil or: [ - character == $!! and: [ - self next ~~ $!! ] ] ] - whileFalse: [ stream nextPut: character ]. - character ifNotNil: [ self skip: -1 ] ])! Item was added: + ----- Method: TextConverter>>nextChunkFromStream: (in category 'fileIn/Out') ----- + nextChunkFromStream: input + "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character." + + input skipSeparators. + ^input parseLangTagFor: ( + String new: 1000 streamContents: [ :output | + | character state | + [ + (character := self nextFromStream: input) == nil or: [ + character == $!! and: [ + state := self saveStateOf: input. + (self nextFromStream: input) ~~ $!! ] ] ] + whileFalse: [ output nextPut: character ]. + character ifNotNil: [ + self restoreStateOf: input with: state ] ])! Item was added: + ----- Method: UTF8TextConverter>>nextChunkFromStream: (in category 'fileIn/Out') ----- + nextChunkFromStream: input + "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character." + + input skipSeparators. + ^input parseLangTagFor: ( + String new: 1000 streamContents: [ :stream | + [ + stream nextPutAll: (input basicUpTo: $!!). + input basicNext == $!! ] + whileTrue: [ + stream nextPut: $!! ]. + input atEnd ifFalse: [ input skip: -1 ] ]) utf8ToSqueak! Item was changed: ----- Method: MultiByteFileStream>>next (in category 'public') ----- next | char secondChar state | + char := converter nextFromStream: self. - char := (converter ifNil: [ self converter ]) nextFromStream: self. (wantsLineEndConversion == true and: [ lineEndConvention notNil ]) "#doConversion is inlined here" ifTrue: [ char == Cr ifTrue: [ state := converter saveStateOf: self. secondChar := self bareNext. secondChar ifNotNil: [ secondChar == Lf ifFalse: [ converter restoreStateOf: self with: state ] ]. ^Cr ]. char == Lf ifTrue: [ ^Cr ] ]. ^char. ! Item was changed: ----- Method: MultiByteFileStream>>basicUpTo: (in category 'private basic') ----- + basicUpTo: delim + "Fast version to speed up nextChunk" + | pos buffer count | + collection ifNotNil: [ + (position < readLimit and: [ + (pos := collection indexOf: delim startingAt: position + 1) <= readLimit and: [ + pos > 0 ] ]) ifTrue: [ + ^collection copyFrom: position + 1 to: (position := pos) - 1 ] ]. + pos := self position. + buffer := self basicNext: 2000. + (count := buffer indexOf: delim) > 0 ifTrue: + ["Found the delimiter part way into buffer" + self position: pos + count. + ^ buffer copyFrom: 1 to: count - 1]. + self atEnd ifTrue: + ["Never found it, and hit end of file" + ^ buffer]. + "Never found it, but there's more..." + ^ buffer , (self basicUpTo: delim)! - basicUpTo: delim - - ^ super upTo: delim. - ! Item was changed: ----- Method: MultiByteFileStream>>basicNext: (in category 'private basic') ----- basicNext: anInteger + ^self basicNextInto: (self collectionSpecies new: anInteger)! - ^ super next: anInteger. - ! |
Free forum by Nabble | Edit this page |