String lines doesn't handle separating Excel copy-and-paste very well,
because soft enters in a cell get splitted into separate lines. Ive done now: splitIntoLines: aString "Return a collection with the string-lines of the receiver." | input char temp inQuote| input := aString readStream. ^ Array streamContents: [ :output | temp := ''. inQuote := false. [ input atEnd ] whileFalse: [ char := input next. char = $" ifTrue: [ inQuote ifTrue:[ input peek = (Character tab) ifTrue: [ char := input next.]. input peek = (Character cr) ifTrue: [ char := input next. inQuote := false]]]. char = (Character tab) ifTrue:[ inQuote ifTrue: [ inQuote := false] ifFalse: [ input peek= $" ifTrue: [ input next. inQuote:=true]]]. char = (Character cr) ifFalse: [temp := temp, char asString] ifTrue: [ inQuote ifFalse: [ output nextPut: temp. temp:=''. input peek = Character lf ifTrue: [input next]]]]] I would be interested in (speed & elegance & mistakes) improvements to it. Stephan _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
[Newbies] Splitting Excel csv into lines
*** >stephan at stack.nl stephan at stack.nl >Wed Dec 24 13:32:34 UTC 2008 > > >String lines doesn't handle separating Excel copy-and-paste very well, >because soft enters in a cell get split into separate lines. > >Ive done now: > >splitIntoLines: aString > "Return a collection with the string-lines of the receiver." > > | input char temp inQuote| > input := aString readStream. > ^ Array streamContents: [ :output | > temp := ''. > inQuote := false. > [ input atEnd ] whileFalse: [ > char := input next. > char = $" ifTrue: [ > inQuote ifTrue:[ > input peek = (Character tab) ifTrue: [ > char := input next.]. > input peek = (Character cr) ifTrue: [ > char := input next. > inQuote := false]]]. > char = (Character tab) ifTrue:[ > inQuote ifTrue: [ inQuote := false] > ifFalse: [ > input peek= $" ifTrue: [ > input next. > inQuote:=true]]]. > char = (Character cr) > ifFalse: [temp := temp, char asString] > ifTrue: [ > inQuote ifFalse: [ > output nextPut: temp. > temp:=''. > input peek = Character lf ifTrue: [input next]]]]] > >I would be interested in (speed & elegance & mistakes) improvements to it. Hi Stephan, A quick look at your code shows it to be "hard to read". >From the outside I had a hard time figuring out what it is trying to do. Actually I gave up. My guess is this has a 50-50 chance of doing what you want in all cases. So step 1: Describe the rules for breaking up excel lines in a comment. step 2: Now that you have code to do the work, prove it works. Write several example lines for it to break up. ( My assistant Puck says: "write really devious examples" By which he means examples that will be hard to break up) Have it break them up. Write a test asserting the the input lines break up into their output components.(Look at other sunit tests for examples). revising for readability. A lot is done to keep track of quote state. I would write a separate method for dealing with the input while in quote state. As an argument you can pass the input stream and possibly the output stream. When it returns you would no longer be in quote mode and the streams would be updated. Once this is done run the tests again. Do they still work? revising for speed Note do NOT work on this first. Work on speed after you have assured you get the correct results. What you need to know about [temp := temp , char asString] is that , will copy the string each time. Building a string up character by character will be Sloooow. So make temp a writeStream of characters and build it by doing nextPut: . Retrieve the string by temp contents. After you do this step. Run the tests again. Do they still work? Hth, Yours in curiosity and service, --Jerome Peace *** _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
In reply to this post by Stephan Eggermont-3
Jerome: thank you for reminding me to write this test-driven.
Stephan _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
In reply to this post by Stephan Eggermont-3
Hi Stephan,
It's a bit hard to understand. Especially without tests. I would probably split it into a few methods such that each method tells the reader a manageable chunk of what is going on. I've sketched a few other things as well to give you some ideas: splitIntoLines: aString | in | in := aString readStream. ^ Array streamContents: [:out | [in atEnd] whileFalse: [ out nextPut: (self readLine: in) ] ] readLine: aStream inQuote := false. "instance variable " " Use a stream to build the string instead of #, " ^ String streamContents: [:out | | char | " Give clear indication of what an end of line is " [((char := aStream next) = Character cr) and: [inQuote not]] whileFalse: [self readNext: aStream onto: out]. " Use stream methods where you can" aStream peekFor: Character lf] readNext: inStream onto: outStream | char | char := inStream last. " the loop calls next for us " " This used to be ifTrue: [ ifTrue: [ with no ifFalse: " (char = $" and: [inQuote]) ifTrue:[ " Common tasks given to helper methods " char := self lookFor: Character tab on: inStream. char := self lookFor: Character tab on: inStream ifFound: [inQuote := false]. ]. char = (Character tab) ifTrue:[ inQuote ifTrue: [inQuote := false] ifFalse: [ self lookFor: Character tab in: aStream ifFound: [inQuote := false]. ] ]. " BTW, not sure this was right - don't you want CRs that aren't part of EOLs? " char = (Character cr) ifFalse: [out nextPut: char] lookFor: aCharacter in: aStream ^ self lookFor: aCharacter in: aStream ifFound: [] lookFor: aCharacter in: aStream ifFound: aBlock aStream peek = aCharacter ifTrue: [ aBlock value. ^ aStream next. ]. ^ aStream last Hope this helps, Zulq. [hidden email] wrote: > String lines doesn't handle separating Excel copy-and-paste very well, > because soft enters in a cell get splitted into separate lines. > > Ive done now: > > splitIntoLines: aString > "Return a collection with the string-lines of the receiver." > > | input char temp inQuote| > input := aString readStream. > ^ Array streamContents: [ :output | > temp := ''. > inQuote := false. > [ input atEnd ] whileFalse: [ > char := input next. > char = $" ifTrue: [ > inQuote ifTrue:[ > input peek = (Character tab) ifTrue: [ > char := input next.]. > input peek = (Character cr) ifTrue: [ > char := input next. > inQuote := false]]]. > char = (Character tab) ifTrue:[ > inQuote ifTrue: [ inQuote := false] > ifFalse: [ > input peek= $" ifTrue: [ > input next. > inQuote:=true]]]. > char = (Character cr) > ifFalse: [temp := temp, char asString] > ifTrue: [ > inQuote ifFalse: [ > output nextPut: temp. > temp:=''. > input peek = Character lf ifTrue: [input next]]]]] > > I would be interested in (speed & elegance & mistakes) improvements to it. > > Stephan _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
Free forum by Nabble | Edit this page |