Issue 3347 in pharo: simplified and unified String's line-ending changing methods

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue 3347 in pharo: simplified and unified String's line-ending changing methods

pharo
Status: FixedWaitingToBePharoed
Owner: stephane.ducasse
Labels: Milestone-1.3 Type-Squeak

New issue 3347 by stephane.ducasse: simplified and unified String's  
line-ending changing methods
http://code.google.com/p/pharo/issues/detail?id=3347

Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.409.mcz

==================== Summary ====================

Name: Collections-ul.409
Author: ul
Time: 22 November 2010, 1:35:14.026 pm
UUID: 849eb41a-5717-9e44-ab23-eae36fb603bb
Ancestors: Collections-ul.408

- introduced String >> #withLineEndings:
- simplified and unified String's line-ending changing methods:  
#withInternetLineEndings, #withSqueakLineEndings  and #withUnixLineEndings

=============== Diff against Collections-ul.408 ===============

Item was changed:
  ----- Method: String>>withInternetLineEndings (in category 'internet')  
-----
  withInternetLineEndings
        "change line endings from CR's and LF's to CRLF's.  This is probably  
in prepration for sending a string over the Internet"

+       ^self withLineEndings: String crlf!
-       ^self class
-               new: self size * 16 // 15 "provisions for CR-LF pairs"
-               streamContents: [ :stream |
-                       self lineIndicesDo:  
[:start :endWithoutDelimiters :end |
-                               stream next: 1 + endWithoutDelimiters -  
start putAll: self startingAt: start.
-                               endWithoutDelimiters = end ifFalse: [
-                                       stream crlf ] ] ]!

Item was added:
+ ----- Method: String>>withLineEndings: (in category 'internet') -----
+ withLineEndings: lineEndingString
+
+       | stream |
+       stream := nil.
+       self lineIndicesDo: [ :start :endWithoutDelimiters :end |
+               (stream isNil and: [ endWithoutDelimiters ~= end ]) ifTrue:  
[
+                       (self copyFrom: endWithoutDelimiters + 1 to: end) =  
lineEndingString ifFalse: [
+                               stream := WriteStream with: self copy.
+                               stream position: start - 1 ] ].
+               stream ifNotNil: [
+                       stream next: endWithoutDelimiters - start + 1  
putAll: self startingAt: start.
+                       endWithoutDelimiters = end ifFalse: [
+                               stream nextPutAll: lineEndingString ] ] ].
+       ^stream
+               ifNil: [ self ]
+               ifNotNil: [
+                       stream position = self size
+                               ifTrue: [ stream originalContents ]
+                               ifFalse: [ stream contents ] ]!

Item was changed:
  ----- Method: String>>withSqueakLineEndings (in category 'internet') -----
  withSqueakLineEndings
        "Assume the string is textual, and that CR, LF, and CRLF are all  
valid line endings.
        Replace each occurence with a single CR."
+       | cr lf indexLF indexCR |
-       | cr lf inPos outPos outString newOutPos indexLF indexCR |
        lf := Character linefeed.
        indexLF := self indexOf: lf startingAt: 1.
        indexLF = 0 ifTrue: [^self].

        cr := Character cr.
        indexCR := self indexOf: cr startingAt: 1.
        indexCR = 0 ifTrue: [^self copy replaceAll: lf with: cr].

+       ^self withLineEndings: String cr!
-       inPos := outPos := 1.
-       outString := String new: self size.
-
-       ["check if next CR (if any) is before next LF"
-       (indexCR > 0 and: [indexCR < indexLF])
-               ifTrue: [
-                       newOutPos := outPos + 1 + indexCR - inPos.
-                       outString replaceFrom: outPos to: newOutPos - 1  
with: self startingAt: inPos.
-                       outPos := newOutPos.
-                       1 + indexCR = indexLF
-                               ifTrue: ["Caught a CR-LF pair"
-                                       inPos := 1 + indexLF.
-                                       indexLF := self  indexOf: lf  
startingAt: inPos]
-                               ifFalse: [inPos := 1 + indexCR].
-                       indexCR := self indexOf: cr startingAt: inPos]
-               ifFalse: [
-                       newOutPos := outPos + 1 + indexLF - inPos.
-                       outString replaceFrom: outPos to: newOutPos - 2  
with: self startingAt: inPos.
-                       outString at: newOutPos - 1 put: cr.
-                       outPos := newOutPos.
-                       inPos := 1 + indexLF.
-                       indexLF := self indexOf: lf startingAt: inPos].
-       indexLF = 0]
-               whileFalse.
-
-       "no more LF line endings.  copy the rest"
-       newOutPos := outPos + (self size - inPos + 1).
-       outString replaceFrom: outPos to: newOutPos - 1 with: self  
startingAt: inPos.
-       ^outString copyFrom: 1 to: newOutPos - 1!

Item was changed:
  ----- Method: String>>withUnixLineEndings (in category 'internet') -----
  withUnixLineEndings
        "Assume the string is textual, and that CR, LF, and CRLF are all  
valid line endings.
        Replace each occurence with a single LF."
+       | cr lf indexLF indexCR |
-       | cr lf inPos outPos outString newOutPos indexLF indexCR |
        cr := Character cr.
        indexCR := self indexOf: cr startingAt: 1.
        indexCR = 0 ifTrue: [^self].

        lf := Character linefeed.
        indexLF := self indexOf: lf startingAt: 1.
        indexLF = 0 ifTrue: [^self copy replaceAll: cr with: lf].

+       ^self withLineEndings: String lf!
-       inPos := outPos := 1.
-       outString := String new: self size.
-
-       ["check if next CR is before next LF or if there are no more LF"
-       (indexLF = 0 or: [indexCR < indexLF])
-               ifTrue: [
-                       newOutPos := outPos + 1 + indexCR - inPos.
-                       outString replaceFrom: outPos to: newOutPos - 2  
with: self startingAt: inPos.
-                       outString at: newOutPos - 1 put: lf.
-                       outPos := newOutPos.
-                       1 + indexCR = indexLF
-                               ifTrue: ["Caught a CR-LF pair"
-                                       inPos := 1 + indexLF.
-                                       indexLF := self  indexOf: lf  
startingAt: inPos]
-                               ifFalse: [inPos := 1 + indexCR].
-                       indexCR := self indexOf: cr startingAt: inPos]
-               ifFalse: [
-                       newOutPos := outPos + 1 + indexLF - inPos.
-                       outString replaceFrom: outPos to: newOutPos - 1  
with: self startingAt: inPos.
-                       outPos := newOutPos.
-                       inPos := 1 + indexLF.
-                       indexLF := self indexOf: lf startingAt: inPos].
-       indexCR = 0]
-               whileFalse.
-
-       "no more CR line endings.  copy the rest"
-       newOutPos := outPos + (self size - inPos + 1).
-       outString replaceFrom: outPos to: newOutPos - 1 with: self  
startingAt: inPos.
-       ^outString copyFrom: 1 to: newOutPos - 1!


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3347 in pharo: simplified and unified String's line-ending changing methods

pharo

Comment #1 on issue 3347 by stephane.ducasse: simplified and unified  
String's line-ending changing methods
http://code.google.com/p/pharo/issues/detail?id=3347

Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.410.mcz

==================== Summary ====================

Name: Collections-ul.410
Author: ul
Time: 23 November 2010, 8:24:12.434 am
UUID: a68748b5-3380-9645-8686-fc80a9710dc6
Ancestors: Collections-ul.409

- added a translation table to String for exchanging cr and lf characters
- simplified and enhanced String's #withSqueakLineEndings and  
#withUnixLineEndings

=============== Diff against Collections-ul.409 ===============

Item was changed:
  ArrayedCollection subclass: #String
        instanceVariableNames: ''
+       classVariableNames: 'AsciiOrder CSLineEnders CSNonSeparators  
CSSeparators CaseInsensitiveOrder CaseSensitiveOrder CrLfExchangeTable  
HtmlEntities LowercasingTable Tokenish UppercasingTable'
-       classVariableNames: 'AsciiOrder CSLineEnders CSNonSeparators  
CSSeparators CaseInsensitiveOrder CaseSensitiveOrder HtmlEntities  
LowercasingTable Tokenish UppercasingTable'
        poolDictionaries: ''
        category: 'Collections-Strings'!

  !String commentStamp: '<historical>' prior: 0!
  A String is an indexed collection of Characters. Class String provides the  
abstract super class for ByteString (that represents an array of 8-bit  
Characters) and WideString (that represents an array of  32-bit  
characters).  In the similar manner of LargeInteger and SmallInteger, those  
subclasses are chosen accordingly for a string; namely as long as the  
system can figure out so, the String is used to represent the given string.

  Strings support a vast array of useful methods, which can best be learned  
by browsing and trying out examples as you find them in the code.

  Here are a few useful methods to look at...
        String match:
        String contractTo:

  String also inherits many useful methods from its hierarchy, such as
        SequenceableCollection ,
        SequenceableCollection copyReplaceAll:with:
  !

Item was added:
+ ----- Method: String class>>crLfExchangeTable (in category 'accessing')  
-----
+ crLfExchangeTable
+
+       ^CrLfExchangeTable!

Item was changed:
  ----- Method: String class>>initialize (in category 'initialization') -----
  initialize   "self initialize"

        | order |
        AsciiOrder := (0 to: 255) as: ByteArray.

        CaseInsensitiveOrder := AsciiOrder copy.
        ($a to: $z) do:
                [:c | CaseInsensitiveOrder at: c asciiValue + 1
                                put: (CaseInsensitiveOrder at: c asUppercase  
asciiValue +1)].

        "Case-sensitive compare sorts space, digits, letters, all the  
rest..."
        CaseSensitiveOrder := ByteArray new: 256 withAll: 255.
        order := -1.
        ' 0123456789' do:  "0..10"
                [:c | CaseSensitiveOrder at: c asciiValue + 1 put: (order :=  
order+1)].
        ($a to: $z) do:     "11-64"
                [:c | CaseSensitiveOrder at: c asUppercase asciiValue + 1  
put: (order := order+1).
                CaseSensitiveOrder at: c asciiValue + 1 put: (order :=  
order+1)].
        1 to: CaseSensitiveOrder size do:
                [:i | (CaseSensitiveOrder at: i) = 255 ifTrue:
                        [CaseSensitiveOrder at: i put: (order := order+1)]].
        order = 255 ifFalse: [self error: 'order problem'].

        "a table for translating to lower case"
        LowercasingTable := String withAll: (Character allByteCharacters  
collect: [:c | c asLowercase]).

        "a table for translating to upper case"
        UppercasingTable := String withAll: (Character allByteCharacters  
collect: [:c | c asUppercase]).

        "a table for testing tokenish (for fast numArgs)"
        Tokenish := String withAll: (Character allByteCharacters collect:
                                                                        [:c  
| c tokenish ifTrue: [c] ifFalse: [$~]]).

        "CR and LF--characters that terminate a line"
        CSLineEnders := CharacterSet crlf.

        "separators and non-separators"
        CSSeparators := CharacterSet separators.
+       CSNonSeparators := CSSeparators complement.
+
+       "a table for exchanging cr with lf and vica versa"
+       CrLfExchangeTable := Character allByteCharacters collect: [ :each |
+               each
+                       caseOf: {
+                               [ Character cr ] -> [ Character lf ].
+                               [ Character lf ] -> [ Character cr ] }
+                       otherwise: [ each ] ]!
-       CSNonSeparators := CSSeparators complement.!

Item was changed:
  ----- Method: String>>withSqueakLineEndings (in category 'internet') -----
  withSqueakLineEndings
        "Assume the string is textual, and that CR, LF, and CRLF are all  
valid line endings.
        Replace each occurence with a single CR."
-       | cr lf indexLF indexCR |
-       lf := Character linefeed.
-       indexLF := self indexOf: lf startingAt: 1.
-       indexLF = 0 ifTrue: [^self].
-
-       cr := Character cr.
-       indexCR := self indexOf: cr startingAt: 1.
-       indexCR = 0 ifTrue: [^self copy replaceAll: lf with: cr].

+       (self includes: Character lf) ifFalse: [ ^self ].
+       (self includes: Character cr) ifFalse: [
+               ^self translateWith: String crLfExchangeTable ].
        ^self withLineEndings: String cr!

Item was changed:
  ----- Method: String>>withUnixLineEndings (in category 'internet') -----
  withUnixLineEndings
        "Assume the string is textual, and that CR, LF, and CRLF are all  
valid line endings.
        Replace each occurence with a single LF."
-       | cr lf indexLF indexCR |
-       cr := Character cr.
-       indexCR := self indexOf: cr startingAt: 1.
-       indexCR = 0 ifTrue: [^self].
-
-       lf := Character linefeed.
-       indexLF := self indexOf: lf startingAt: 1.
-       indexLF = 0 ifTrue: [^self copy replaceAll: cr with: lf].

+       (self includes: Character cr) ifFalse: [ ^self ].
+       (self includes: Character lf) ifFalse: [
+               ^self translateWith: String crLfExchangeTable ].
        ^self withLineEndings: String lf!


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3347 in pharo: simplified and unified String's line-ending changing methods

pharo
Updates:
        Labels: -Milestone-1.3

Comment #2 on issue 3347 by [hidden email]: simplified and unified  
String's line-ending changing methods
http://code.google.com/p/pharo/issues/detail?id=3347

(No comment was entered for this change.)