Status: FixedWaitingToBePharoed
Owner: stephane.ducasse Labels: Milestone-1.3 Type-Squeak New issue 3347 by stephane.ducasse: simplified and unified String's line-ending changing methods http://code.google.com/p/pharo/issues/detail?id=3347 Levente Uzonyi uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-ul.409.mcz ==================== Summary ==================== Name: Collections-ul.409 Author: ul Time: 22 November 2010, 1:35:14.026 pm UUID: 849eb41a-5717-9e44-ab23-eae36fb603bb Ancestors: Collections-ul.408 - introduced String >> #withLineEndings: - simplified and unified String's line-ending changing methods: #withInternetLineEndings, #withSqueakLineEndings and #withUnixLineEndings =============== Diff against Collections-ul.408 =============== Item was changed: ----- Method: String>>withInternetLineEndings (in category 'internet') ----- withInternetLineEndings "change line endings from CR's and LF's to CRLF's. This is probably in prepration for sending a string over the Internet" + ^self withLineEndings: String crlf! - ^self class - new: self size * 16 // 15 "provisions for CR-LF pairs" - streamContents: [ :stream | - self lineIndicesDo: [:start :endWithoutDelimiters :end | - stream next: 1 + endWithoutDelimiters - start putAll: self startingAt: start. - endWithoutDelimiters = end ifFalse: [ - stream crlf ] ] ]! Item was added: + ----- Method: String>>withLineEndings: (in category 'internet') ----- + withLineEndings: lineEndingString + + | stream | + stream := nil. + self lineIndicesDo: [ :start :endWithoutDelimiters :end | + (stream isNil and: [ endWithoutDelimiters ~= end ]) ifTrue: [ + (self copyFrom: endWithoutDelimiters + 1 to: end) = lineEndingString ifFalse: [ + stream := WriteStream with: self copy. + stream position: start - 1 ] ]. + stream ifNotNil: [ + stream next: endWithoutDelimiters - start + 1 putAll: self startingAt: start. + endWithoutDelimiters = end ifFalse: [ + stream nextPutAll: lineEndingString ] ] ]. + ^stream + ifNil: [ self ] + ifNotNil: [ + stream position = self size + ifTrue: [ stream originalContents ] + ifFalse: [ stream contents ] ]! Item was changed: ----- Method: String>>withSqueakLineEndings (in category 'internet') ----- withSqueakLineEndings "Assume the string is textual, and that CR, LF, and CRLF are all valid line endings. Replace each occurence with a single CR." + | cr lf indexLF indexCR | - | cr lf inPos outPos outString newOutPos indexLF indexCR | lf := Character linefeed. indexLF := self indexOf: lf startingAt: 1. indexLF = 0 ifTrue: [^self]. cr := Character cr. indexCR := self indexOf: cr startingAt: 1. indexCR = 0 ifTrue: [^self copy replaceAll: lf with: cr]. + ^self withLineEndings: String cr! - inPos := outPos := 1. - outString := String new: self size. - - ["check if next CR (if any) is before next LF" - (indexCR > 0 and: [indexCR < indexLF]) - ifTrue: [ - newOutPos := outPos + 1 + indexCR - inPos. - outString replaceFrom: outPos to: newOutPos - 1 with: self startingAt: inPos. - outPos := newOutPos. - 1 + indexCR = indexLF - ifTrue: ["Caught a CR-LF pair" - inPos := 1 + indexLF. - indexLF := self indexOf: lf startingAt: inPos] - ifFalse: [inPos := 1 + indexCR]. - indexCR := self indexOf: cr startingAt: inPos] - ifFalse: [ - newOutPos := outPos + 1 + indexLF - inPos. - outString replaceFrom: outPos to: newOutPos - 2 with: self startingAt: inPos. - outString at: newOutPos - 1 put: cr. - outPos := newOutPos. - inPos := 1 + indexLF. - indexLF := self indexOf: lf startingAt: inPos]. - indexLF = 0] - whileFalse. - - "no more LF line endings. copy the rest" - newOutPos := outPos + (self size - inPos + 1). - outString replaceFrom: outPos to: newOutPos - 1 with: self startingAt: inPos. - ^outString copyFrom: 1 to: newOutPos - 1! Item was changed: ----- Method: String>>withUnixLineEndings (in category 'internet') ----- withUnixLineEndings "Assume the string is textual, and that CR, LF, and CRLF are all valid line endings. Replace each occurence with a single LF." + | cr lf indexLF indexCR | - | cr lf inPos outPos outString newOutPos indexLF indexCR | cr := Character cr. indexCR := self indexOf: cr startingAt: 1. indexCR = 0 ifTrue: [^self]. lf := Character linefeed. indexLF := self indexOf: lf startingAt: 1. indexLF = 0 ifTrue: [^self copy replaceAll: cr with: lf]. + ^self withLineEndings: String lf! - inPos := outPos := 1. - outString := String new: self size. - - ["check if next CR is before next LF or if there are no more LF" - (indexLF = 0 or: [indexCR < indexLF]) - ifTrue: [ - newOutPos := outPos + 1 + indexCR - inPos. - outString replaceFrom: outPos to: newOutPos - 2 with: self startingAt: inPos. - outString at: newOutPos - 1 put: lf. - outPos := newOutPos. - 1 + indexCR = indexLF - ifTrue: ["Caught a CR-LF pair" - inPos := 1 + indexLF. - indexLF := self indexOf: lf startingAt: inPos] - ifFalse: [inPos := 1 + indexCR]. - indexCR := self indexOf: cr startingAt: inPos] - ifFalse: [ - newOutPos := outPos + 1 + indexLF - inPos. - outString replaceFrom: outPos to: newOutPos - 1 with: self startingAt: inPos. - outPos := newOutPos. - inPos := 1 + indexLF. - indexLF := self indexOf: lf startingAt: inPos]. - indexCR = 0] - whileFalse. - - "no more CR line endings. copy the rest" - newOutPos := outPos + (self size - inPos + 1). - outString replaceFrom: outPos to: newOutPos - 1 with: self startingAt: inPos. - ^outString copyFrom: 1 to: newOutPos - 1! |
Comment #1 on issue 3347 by stephane.ducasse: simplified and unified String's line-ending changing methods http://code.google.com/p/pharo/issues/detail?id=3347 Levente Uzonyi uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-ul.410.mcz ==================== Summary ==================== Name: Collections-ul.410 Author: ul Time: 23 November 2010, 8:24:12.434 am UUID: a68748b5-3380-9645-8686-fc80a9710dc6 Ancestors: Collections-ul.409 - added a translation table to String for exchanging cr and lf characters - simplified and enhanced String's #withSqueakLineEndings and #withUnixLineEndings =============== Diff against Collections-ul.409 =============== Item was changed: ArrayedCollection subclass: #String instanceVariableNames: '' + classVariableNames: 'AsciiOrder CSLineEnders CSNonSeparators CSSeparators CaseInsensitiveOrder CaseSensitiveOrder CrLfExchangeTable HtmlEntities LowercasingTable Tokenish UppercasingTable' - classVariableNames: 'AsciiOrder CSLineEnders CSNonSeparators CSSeparators CaseInsensitiveOrder CaseSensitiveOrder HtmlEntities LowercasingTable Tokenish UppercasingTable' poolDictionaries: '' category: 'Collections-Strings'! !String commentStamp: '<historical>' prior: 0! A String is an indexed collection of Characters. Class String provides the abstract super class for ByteString (that represents an array of 8-bit Characters) and WideString (that represents an array of 32-bit characters). In the similar manner of LargeInteger and SmallInteger, those subclasses are chosen accordingly for a string; namely as long as the system can figure out so, the String is used to represent the given string. Strings support a vast array of useful methods, which can best be learned by browsing and trying out examples as you find them in the code. Here are a few useful methods to look at... String match: String contractTo: String also inherits many useful methods from its hierarchy, such as SequenceableCollection , SequenceableCollection copyReplaceAll:with: ! Item was added: + ----- Method: String class>>crLfExchangeTable (in category 'accessing') ----- + crLfExchangeTable + + ^CrLfExchangeTable! Item was changed: ----- Method: String class>>initialize (in category 'initialization') ----- initialize "self initialize" | order | AsciiOrder := (0 to: 255) as: ByteArray. CaseInsensitiveOrder := AsciiOrder copy. ($a to: $z) do: [:c | CaseInsensitiveOrder at: c asciiValue + 1 put: (CaseInsensitiveOrder at: c asUppercase asciiValue +1)]. "Case-sensitive compare sorts space, digits, letters, all the rest..." CaseSensitiveOrder := ByteArray new: 256 withAll: 255. order := -1. ' 0123456789' do: "0..10" [:c | CaseSensitiveOrder at: c asciiValue + 1 put: (order := order+1)]. ($a to: $z) do: "11-64" [:c | CaseSensitiveOrder at: c asUppercase asciiValue + 1 put: (order := order+1). CaseSensitiveOrder at: c asciiValue + 1 put: (order := order+1)]. 1 to: CaseSensitiveOrder size do: [:i | (CaseSensitiveOrder at: i) = 255 ifTrue: [CaseSensitiveOrder at: i put: (order := order+1)]]. order = 255 ifFalse: [self error: 'order problem']. "a table for translating to lower case" LowercasingTable := String withAll: (Character allByteCharacters collect: [:c | c asLowercase]). "a table for translating to upper case" UppercasingTable := String withAll: (Character allByteCharacters collect: [:c | c asUppercase]). "a table for testing tokenish (for fast numArgs)" Tokenish := String withAll: (Character allByteCharacters collect: [:c | c tokenish ifTrue: [c] ifFalse: [$~]]). "CR and LF--characters that terminate a line" CSLineEnders := CharacterSet crlf. "separators and non-separators" CSSeparators := CharacterSet separators. + CSNonSeparators := CSSeparators complement. + + "a table for exchanging cr with lf and vica versa" + CrLfExchangeTable := Character allByteCharacters collect: [ :each | + each + caseOf: { + [ Character cr ] -> [ Character lf ]. + [ Character lf ] -> [ Character cr ] } + otherwise: [ each ] ]! - CSNonSeparators := CSSeparators complement.! Item was changed: ----- Method: String>>withSqueakLineEndings (in category 'internet') ----- withSqueakLineEndings "Assume the string is textual, and that CR, LF, and CRLF are all valid line endings. Replace each occurence with a single CR." - | cr lf indexLF indexCR | - lf := Character linefeed. - indexLF := self indexOf: lf startingAt: 1. - indexLF = 0 ifTrue: [^self]. - - cr := Character cr. - indexCR := self indexOf: cr startingAt: 1. - indexCR = 0 ifTrue: [^self copy replaceAll: lf with: cr]. + (self includes: Character lf) ifFalse: [ ^self ]. + (self includes: Character cr) ifFalse: [ + ^self translateWith: String crLfExchangeTable ]. ^self withLineEndings: String cr! Item was changed: ----- Method: String>>withUnixLineEndings (in category 'internet') ----- withUnixLineEndings "Assume the string is textual, and that CR, LF, and CRLF are all valid line endings. Replace each occurence with a single LF." - | cr lf indexLF indexCR | - cr := Character cr. - indexCR := self indexOf: cr startingAt: 1. - indexCR = 0 ifTrue: [^self]. - - lf := Character linefeed. - indexLF := self indexOf: lf startingAt: 1. - indexLF = 0 ifTrue: [^self copy replaceAll: cr with: lf]. + (self includes: Character cr) ifFalse: [ ^self ]. + (self includes: Character lf) ifFalse: [ + ^self translateWith: String crLfExchangeTable ]. ^self withLineEndings: String lf! |
Updates:
Labels: -Milestone-1.3 Comment #2 on issue 3347 by [hidden email]: simplified and unified String's line-ending changing methods http://code.google.com/p/pharo/issues/detail?id=3347 (No comment was entered for this change.) |
Free forum by Nabble | Edit this page |