Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.606.mcz ==================== Summary ==================== Name: Collections-ul.606 Author: ul Time: 1 April 2015, 11:35:01.718 pm UUID: cd3ba2f1-88d3-4f8a-a06b-089509ceccdf Ancestors: Collections-mt.605 Cache and share #separators and #nonSeparators in CharacterSet (just like #crlf). Remove them on #cleanUp. Implemented CharacterSet class >> #withAll:. Removed #noSeparatorMap, #noSeparators, #separatorMap from String class. Also removed the class variables CSNonSeparators, CSSeparators, and CSLineEnders. All users were rewritten to use CharacterSet's version of these sets. =============== Diff against Collections-mt.605 =============== Item was changed: ----- Method: ByteString>>substrings (in category 'converting') ----- substrings "Answer an array of the substrings that compose the receiver." | result end beginning | result := WriteStream on: (Array new: 10). end := 0. "find one substring each time through this loop" [ "find the beginning of the next substring" + beginning := self indexOfAnyOf: CharacterSet nonSeparators - beginning := self indexOfAnyOf: CSNonSeparators startingAt: end+1 ifAbsent: [ nil ]. beginning ~~ nil ] whileTrue: [ "find the end" + end := self indexOfAnyOf: CharacterSet separators - end := self indexOfAnyOf: CSSeparators startingAt: beginning ifAbsent: [ self size + 1 ]. end := end - 1. result nextPut: (self copyFrom: beginning to: end). ]. ^result contents! Item was changed: Collection subclass: #CharacterSet instanceVariableNames: 'map' + classVariableNames: 'CrLf NonSeparators Separators' - classVariableNames: 'CrLf' poolDictionaries: '' category: 'Collections-Support'! !CharacterSet commentStamp: '<historical>' prior: 0! A set of characters. Lookups for inclusion are very fast.! Item was added: + ----- Method: CharacterSet class>>cleanUp: (in category 'initialize-release') ----- + cleanUp: aggressive + + CrLf := NonSeparators := Separators := nil! Item was changed: ----- Method: CharacterSet class>>crlf (in category 'accessing') ----- crlf + + ^CrLf ifNil: [ CrLf := self with: Character cr with: Character lf ]! - CrLf ifNil: [CrLf := self with: Character cr with: Character lf]. - ^CrLf! Item was changed: + ----- Method: CharacterSet class>>nonSeparators (in category 'accessing') ----- - ----- Method: CharacterSet class>>nonSeparators (in category 'instance creation') ----- nonSeparators "return a set containing everything but the whitespace characters" + ^NonSeparators ifNil: [ + NonSeparators := self separators complement ]! - ^self separators complement! Item was changed: + ----- Method: CharacterSet class>>separators (in category 'accessing') ----- - ----- Method: CharacterSet class>>separators (in category 'instance creation') ----- separators "return a set containing just the whitespace characters" + ^Separators ifNil: [ + Separators := self new + addAll: Character separators; + yourself ]! - | set | - set := self empty. - set addAll: Character separators. - ^set! Item was added: + ----- Method: CharacterSet class>>withAll: (in category 'instance creation') ----- + withAll: aCollection + "Create a new CharacterSet containing all the characters from aCollection." + + ^self newFrom: aCollection! Item was changed: ----- Method: ReadStream>>nextFloat (in category 'accessing') ----- nextFloat "Read a floating point value from the receiver. This method is highly optimized for cases where many floating point values need to be read subsequently. And if this needs to go even faster, look at the inner loops fetching the characters - moving those into a plugin would speed things up even more." | buffer count sign index cc value digit fraction exp startIndex anyDigit digitNeeded | buffer := collection. count := readLimit. index := position+1. "Skip separators" + index := buffer indexOfAnyOf: CharacterSet nonSeparators startingAt: index. - index := ByteString findFirstInString: buffer inSet: String noSeparatorMap startingAt: index. index = 0 ifTrue:[self setToEnd. ^nil]. "check for sign" digitNeeded := false. sign := 1. cc := buffer byteAt: index. cc = 45 "$- asciiValue" ifTrue:[sign := -1. index := index+1. digitNeeded := true] ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1. digitNeeded := true]]. "Read integer part" startIndex := index. value := 0. [index <= count and:[ digit := (buffer byteAt: index) - 48. "$0 asciiValue" digit >= 0 and:[digit <= 9]]] whileTrue:[ value := value * 10 + digit. index := index + 1. ]. anyDigit := index > startIndex. index > count ifTrue:[ (digitNeeded and:[anyDigit not]) ifTrue:[^self error: 'At least one digit expected']. self setToEnd. ^value asFloat * sign]. (buffer byteAt: index) = 46 "$. asciiValue" ifTrue:["<integer>.<fraction>" index := index+1. startIndex := index. "NOTE: fraction and exp below can overflow into LargeInteger range. If they do, then things slow down horribly due to the relatively slow LargeInt -> Float conversion. This can be avoided by changing fraction and exp to use floats to begin with (0.0 and 1.0 respectively), however, this will give different results to Float>>readFrom: and it is not clear if that is acceptable here." fraction := 0. exp := 1. [index <= count and:[ digit := (buffer byteAt: index) - 48. "$0 asciiValue" digit >= 0 and:[digit <= 9]]] whileTrue:[ fraction := fraction * 10 + digit. exp := exp * 10. index := index + 1. ]. value := value + (fraction asFloat / exp asFloat). anyDigit := anyDigit or:[index > startIndex]. ]. value := value asFloat * sign. "At this point we require at least one digit to avoid allowing: - . ('0.0' without leading digits) - e32 ('0e32' without leading digits) - .e32 ('0.0e32' without leading digits) but these are currently allowed: - .5 (0.5) - 1. ('1.0') - 1e32 ('1.0e32') - 1.e32 ('1.0e32') - .5e32 ('0.5e32') " anyDigit ifFalse:["Check for NaN/Infinity first" (count - index >= 2 and:[(buffer copyFrom: index to: index+2) = 'NaN']) ifTrue:[position := index+2. ^Float nan * sign]. (count - index >= 7 and:[(buffer copyFrom: index to: index+7) = 'Infinity']) ifTrue:[position := index+7. ^Float infinity * sign]. ^self error: 'At least one digit expected' ]. index > count ifTrue:[self setToEnd. ^value asFloat]. (buffer byteAt: index) = 101 "$e asciiValue" ifTrue:["<number>e[+|-]<exponent>" index := index+1. "skip e" sign := 1. cc := buffer byteAt: index. cc = 45 "$- asciiValue" ifTrue:[sign := -1. index := index+1] ifFalse:[cc = 43 "$+ asciiValue" ifTrue:[index := index+1]]. startIndex := index. exp := 0. anyDigit := false. [index <= count and:[ digit := (buffer byteAt: index) - 48. "$0 asciiValue" digit >= 0 and:[digit <= 9]]] whileTrue:[ exp := exp * 10 + digit. index := index + 1. ]. index> startIndex ifFalse:[^self error: 'Exponent expected']. value := value * (10.0 raisedToInteger: exp * sign). ]. position := index-1. ^value! Item was changed: ArrayedCollection subclass: #String instanceVariableNames: '' + classVariableNames: 'AsciiOrder CSMacroCharacters CaseInsensitiveOrder CaseSensitiveOrder CrLfExchangeTable HtmlEntities LowercasingTable Tokenish UppercasingTable' - classVariableNames: 'AsciiOrder CSLineEnders CSMacroCharacters CSNonSeparators CSSeparators CaseInsensitiveOrder CaseSensitiveOrder CrLfExchangeTable HtmlEntities LowercasingTable Tokenish UppercasingTable' poolDictionaries: '' category: 'Collections-Strings'! !String commentStamp: '<historical>' prior: 0! A String is an indexed collection of Characters. Class String provides the abstract super class for ByteString (that represents an array of 8-bit Characters) and WideString (that represents an array of 32-bit characters). In the similar manner of LargeInteger and SmallInteger, those subclasses are chosen accordingly for a string; namely as long as the system can figure out so, the String is used to represent the given string. Strings support a vast array of useful methods, which can best be learned by browsing and trying out examples as you find them in the code. Here are a few useful methods to look at... String match: String contractTo: String also inherits many useful methods from its hierarchy, such as SequenceableCollection , SequenceableCollection copyReplaceAll:with: ! Item was changed: ----- Method: String class>>initialize (in category 'initialization') ----- initialize "self initialize" | order | AsciiOrder := (0 to: 255) as: ByteArray. CaseInsensitiveOrder := AsciiOrder copy. ($a to: $z) do: [:c | CaseInsensitiveOrder at: c asciiValue + 1 put: (CaseInsensitiveOrder at: c asUppercase asciiValue +1)]. "Case-sensitive compare sorts space, digits, letters, all the rest..." CaseSensitiveOrder := ByteArray new: 256 withAll: 255. order := -1. ' 0123456789' do: "0..10" [:c | CaseSensitiveOrder at: c asciiValue + 1 put: (order := order+1)]. ($a to: $z) do: "11-64" [:c | CaseSensitiveOrder at: c asUppercase asciiValue + 1 put: (order := order+1). CaseSensitiveOrder at: c asciiValue + 1 put: (order := order+1)]. 1 to: CaseSensitiveOrder size do: [:i | (CaseSensitiveOrder at: i) = 255 ifTrue: [CaseSensitiveOrder at: i put: (order := order+1)]]. order = 255 ifFalse: [self error: 'order problem']. "a table for translating to lower case" LowercasingTable := String withAll: (Character allByteCharacters collect: [:c | c asLowercase]). "a table for translating to upper case" UppercasingTable := String withAll: (Character allByteCharacters collect: [:c | c asUppercase]). "a table for testing tokenish (for fast numArgs)" Tokenish := String withAll: (Character allByteCharacters collect: [:c | c tokenish ifTrue: [c] ifFalse: [$~]]). + - - "CR and LF--characters that terminate a line" - CSLineEnders := CharacterSet crlf. - - "separators and non-separators" - CSSeparators := CharacterSet separators. - CSNonSeparators := CSSeparators complement. - "% and < for #expandMacros*" CSMacroCharacters := CharacterSet newFrom: '%<'. "a table for exchanging cr with lf and vica versa" CrLfExchangeTable := Character allByteCharacters collect: [ :each | each caseOf: { [ Character cr ] -> [ Character lf ]. [ Character lf ] -> [ Character cr ] } otherwise: [ each ] ]! Item was removed: - ----- Method: String class>>noSeparatorMap (in category 'accessing') ----- - noSeparatorMap - ^CSNonSeparators byteArrayMap! Item was removed: - ----- Method: String class>>noSeparators (in category 'accessing') ----- - noSeparators - ^ CSNonSeparators! Item was removed: - ----- Method: String class>>separatorMap (in category 'accessing') ----- - separatorMap - ^CSSeparators byteArrayMap! Item was changed: ----- Method: String>>indentationIfBlank: (in category 'paragraph support') ----- indentationIfBlank: aBlock "Answer the number of leading tabs in the receiver. If there are no visible characters, pass the number of tabs to aBlock and return its value." | leadingTabs nonTab nonTabIndex nonSepIndex lineEndIndex | nonTab := (CharacterSet with: Character tab) complement. nonTabIndex := self indexOfAnyOf: nonTab startingAt: 1. nonTabIndex = 0 ifTrue: [ "Only made of tabs or empty" ^aBlock value: self size]. leadingTabs := nonTabIndex - 1. + nonSepIndex := self indexOfAnyOf: CharacterSet nonSeparators startingAt: 1. - nonSepIndex := self indexOfAnyOf: CSNonSeparators startingAt: 1. nonSepIndex = 0 ifTrue: [ "Only made of separators" ^aBlock value: leadingTabs]. + lineEndIndex := self indexOfAnyOf: CharacterSet crlf startingAt: 1. - lineEndIndex := self indexOfAnyOf: CSLineEnders startingAt: 1. (lineEndIndex between: 1 and: nonSepIndex) ifTrue: [ "Only made of separators up to a line end" ^aBlock value: leadingTabs]. ^leadingTabs! Item was changed: ----- Method: String>>withBlanksTrimmed (in category 'converting') ----- withBlanksTrimmed "Return a copy of the receiver from which leading and trailing blanks have been trimmed." | first last | + first := self indexOfAnyOf: CharacterSet nonSeparators startingAt: 1 ifAbsent: [0]. - first := self indexOfAnyOf: CSNonSeparators startingAt: 1 ifAbsent: [0]. first = 0 ifTrue: [ ^'' ]. "no non-separator character" + last := self lastIndexOfAnyOf: CharacterSet nonSeparators startingAt: self size ifAbsent: [self size]. - last := self lastIndexOfAnyOf: CSNonSeparators startingAt: self size ifAbsent: [self size]. (first = 1 and: [ last = self size ]) ifTrue: [ ^self copy ]. ^self copyFrom: first to: last ! Item was changed: ----- Method: String>>withNoLineLongerThan: (in category 'converting') ----- withNoLineLongerThan: aNumber "Answer a string with the same content as receiver, but rewrapped so that no line has more characters than the given number" aNumber isNumber not | (aNumber < 1) ifTrue: [self error: 'too narrow']. ^self class new: self size * (aNumber + 1) // aNumber "provision for supplementary line breaks" streamContents: [ :stream | self lineIndicesDo: [ :start :endWithoutDelimiters :end | | pastEnd lineStart | pastEnd := endWithoutDelimiters + 1. "eliminate spaces at beginning of line" + lineStart := (self indexOfAnyOf: CharacterSet nonSeparators startingAt: start ifAbsent: [pastEnd]) min: pastEnd. - lineStart := (self indexOfAnyOf: CSNonSeparators startingAt: start ifAbsent: [pastEnd]) min: pastEnd. [| lineStop lineEnd spacePosition | lineEnd := lineStop := lineStart + aNumber min: pastEnd.. spacePosition := lineStart. [spacePosition < lineStop] whileTrue: [ + spacePosition := self indexOfAnyOf: CharacterSet separators startingAt: spacePosition + 1 ifAbsent: [pastEnd]. - spacePosition := self indexOfAnyOf: CSSeparators startingAt: spacePosition + 1 ifAbsent: [pastEnd]. spacePosition <= lineStop ifTrue: [lineEnd := spacePosition]. ]. "split before space or before lineStop if no space" stream nextPutAll: (self copyFrom: lineStart to: lineEnd - 1). "eliminate spaces at beginning of next line" + lineStart := self indexOfAnyOf: CharacterSet nonSeparators startingAt: lineEnd ifAbsent: [pastEnd]. - lineStart := self indexOfAnyOf: CSNonSeparators startingAt: lineEnd ifAbsent: [pastEnd]. lineStart <= endWithoutDelimiters ] whileTrue: [stream cr]. stream nextPutAll: (self copyFrom: pastEnd to: end) ] ]! Item was changed: ----- Method: String>>withSeparatorsCompacted (in category 'converting') ----- withSeparatorsCompacted "replace each sequences of whitespace by a single space character" "' test ' withSeparatorsCompacted = ' test '" "' test test' withSeparatorsCompacted = ' test test'" "'test test ' withSeparatorsCompacted = 'test test '" | out in next isSeparator | self isEmpty ifTrue: [^ self]. out := WriteStream on: (String new: self size). in := self readStream. isSeparator := [:char | char asciiValue < 256 + and: [CharacterSet separators includes: char]]. - and: [CSSeparators includes: char]]. [in atEnd] whileFalse: [ next := in next. (isSeparator value: next) ifTrue: [ out nextPut: $ . [in atEnd or: [next := in next. (isSeparator value: next) ifTrue: [false] ifFalse: [out nextPut: next. true]]] whileFalse] ifFalse: [out nextPut: next]]. ^ out contents! Item was changed: ----- Method: Text>>withBlanksTrimmed (in category 'converting') ----- withBlanksTrimmed "Return a copy of the receiver from which leading and trailing blanks have been trimmed." | first last | + first := string indexOfAnyOf: CharacterSet nonSeparators startingAt: 1. - first := string indexOfAnyOf: String noSeparators startingAt: 1 ifAbsent: [0]. first = 0 ifTrue: [ ^'' ]. "no non-separator character" + last := string lastIndexOfAnyOf: CharacterSet nonSeparators startingAt: self size ifAbsent: [self size]. - last := string lastIndexOfAnyOf: String noSeparators startingAt: self size ifAbsent: [self size]. (first = 1 and: [ last = self size ]) ifTrue: [ ^self copy ]. ^self copyFrom: first to: last ! |
Free forum by Nabble | Edit this page |