The Trunk: Multilingual-ul.85.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Multilingual-ul.85.mcz

commits-2
Andreas Raab uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-ul.85.mcz

==================== Summary ====================

Name: Multilingual-ul.85
Author: ul
Time: 6 February 2010, 12:37:47.216 am
UUID: 47b3a790-8f3b-6341-abf3-21bf95892821
Ancestors: Multilingual-nice.84

- fix #basicNext: and #basicUpTo: in MultiByteFileStream
- add chunk reading capabilities to TextConverter
- assume that MultiByteFileStream's converter is properly initialized in #next
- MultiByteFileStream >> #nextChunk uses its converter's chunk reading capabilities, this speeds gives >3x speedup if the file has UTF-8 encoding
- fix: MultiByteFileStream lost it's position if the ! character was encoded to more than a single byte (ex UTF16)

=============== Diff against Multilingual-nice.84 ===============

Item was changed:
  ----- Method: MultiByteFileStream>>nextChunk (in category 'fileIn/Out') -----
  nextChunk
  "Answer the contents of the receiver, up to the next terminator character. Doubled terminators indicate an embedded terminator character."
 
+ ^converter nextChunkFromStream: self!
- self skipSeparators.
- ^self parseLangTagFor: (
- String new: 1000 streamContents: [ :stream |
- | character |
- [
- (character := self next) == nil or: [
- character == $!! and: [
- self next ~~ $!! ] ] ]
- whileFalse: [ stream nextPut: character ].
- character ifNotNil: [ self skip: -1 ] ])!

Item was added:
+ ----- Method: TextConverter>>nextChunkFromStream: (in category 'fileIn/Out') -----
+ nextChunkFromStream: input
+ "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character."
+
+ input skipSeparators.
+ ^input parseLangTagFor: (
+ String new: 1000 streamContents: [ :output |
+ | character state |
+ [
+ (character := self nextFromStream: input) == nil or: [
+ character == $!! and: [
+ state := self saveStateOf: input.
+ (self nextFromStream: input) ~~ $!! ] ] ]
+ whileFalse: [ output nextPut: character ].
+ character ifNotNil: [
+ self restoreStateOf: input with: state ] ])!

Item was added:
+ ----- Method: UTF8TextConverter>>nextChunkFromStream: (in category 'fileIn/Out') -----
+ nextChunkFromStream: input
+ "Answer the contents of input, up to the next terminator character. Doubled terminators indicate an embedded terminator character."
+
+ input skipSeparators.
+ ^input parseLangTagFor: (
+ String new: 1000 streamContents: [ :stream |
+ [
+ stream nextPutAll: (input basicUpTo: $!!).
+ input basicNext == $!! ]
+ whileTrue: [
+ stream nextPut: $!! ].
+ input atEnd ifFalse: [ input skip: -1 ] ]) utf8ToSqueak!

Item was changed:
  ----- Method: MultiByteFileStream>>next (in category 'public') -----
  next
 
  | char secondChar state |
+ char := converter nextFromStream: self.
- char := (converter ifNil: [ self converter ]) nextFromStream: self.
  (wantsLineEndConversion == true and: [ lineEndConvention notNil ]) "#doConversion is inlined here"
  ifTrue: [
  char == Cr ifTrue: [
  state := converter saveStateOf: self.
  secondChar := self bareNext.
  secondChar ifNotNil: [
  secondChar == Lf ifFalse: [ converter restoreStateOf: self with: state ] ].
  ^Cr ].
  char == Lf ifTrue: [
  ^Cr ] ].
  ^char.
 
  !

Item was changed:
  ----- Method: MultiByteFileStream>>basicUpTo: (in category 'private basic') -----
+ basicUpTo: delim
+ "Fast version to speed up nextChunk"
+ | pos buffer count |
+ collection ifNotNil: [
+ (position < readLimit and: [
+ (pos := collection indexOf: delim startingAt: position + 1) <= readLimit and: [
+ pos > 0 ] ]) ifTrue: [
+ ^collection copyFrom: position + 1 to: (position := pos) - 1 ] ].
+ pos := self position.
+ buffer := self basicNext: 2000.
+ (count := buffer indexOf: delim) > 0 ifTrue:
+ ["Found the delimiter part way into buffer"
+ self position: pos + count.
+ ^ buffer copyFrom: 1 to: count - 1].
+ self atEnd ifTrue:
+ ["Never found it, and hit end of file"
+ ^ buffer].
+ "Never found it, but there's more..."
+ ^ buffer , (self basicUpTo: delim)!
- basicUpTo: delim
-
- ^ super upTo: delim.
- !

Item was changed:
  ----- Method: MultiByteFileStream>>basicNext: (in category 'private basic') -----
  basicNext: anInteger
 
+ ^self basicNextInto: (self collectionSpecies new: anInteger)!
- ^ super next: anInteger.
- !