Splitting Excel csv into lines

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Splitting Excel csv into lines

Stephan Eggermont-3
String lines doesn't handle separating Excel copy-and-paste very well,
because soft enters in a cell get splitted into separate lines.

Ive done now:

splitIntoLines: aString
        "Return a collection with the string-lines of the receiver."

        | input char temp inQuote|
        input := aString readStream.
        ^ Array streamContents: [ :output |
                temp := ''.
                inQuote := false.
                [ input atEnd ] whileFalse: [
                        char := input next.
                        char = $" ifTrue: [
                                inQuote ifTrue:[
                                        input peek = (Character tab) ifTrue: [
                                                char := input next.].
                                        input peek = (Character cr) ifTrue: [
                                                char := input next.
                                                inQuote := false]]].
                        char = (Character tab) ifTrue:[
                                inQuote ifTrue: [ inQuote := false]
                                ifFalse: [
                                        input peek= $" ifTrue: [
                                                input next.
                                                inQuote:=true]]].
                        char = (Character cr)
                                ifFalse: [temp := temp, char asString]
                                ifTrue: [
                                        inQuote ifFalse: [
                                                output nextPut: temp.
                                                temp:=''.
                                                input peek = Character lf ifTrue: [input next]]]]]

I would be interested in (speed & elegance & mistakes) improvements to it.

Stephan
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Splitting Excel csv into lines

Jerome Peace
[Newbies] Splitting Excel csv into lines

***

>stephan at stack.nl stephan at stack.nl
>Wed Dec 24 13:32:34 UTC 2008
>
>
>String lines doesn't handle separating Excel copy-and-paste very well,
>because soft enters in a cell get split into separate lines.
>
>Ive done now:
>
>splitIntoLines: aString
> "Return a collection with the string-lines of the receiver."
>
> | input char temp inQuote|
> input := aString readStream.
> ^ Array streamContents: [ :output |
> temp := ''.
> inQuote := false.
> [ input atEnd ] whileFalse: [
> char := input next.
> char = $" ifTrue: [
> inQuote ifTrue:[
> input peek = (Character tab) ifTrue: [
> char := input next.].
> input peek = (Character cr) ifTrue: [
> char := input next.
> inQuote := false]]].
> char = (Character tab) ifTrue:[
> inQuote ifTrue: [ inQuote := false]
> ifFalse: [
> input peek= $" ifTrue: [
> input next.
> inQuote:=true]]].
> char = (Character cr)
> ifFalse: [temp := temp, char asString]
> ifTrue: [
> inQuote ifFalse: [
> output nextPut: temp.
> temp:=''.
> input peek = Character lf ifTrue: [input next]]]]]
>
>I would be interested in (speed & elegance & mistakes) improvements to it.

Hi Stephan,

A quick look at your code shows it to be "hard to read".
>From the outside I had a hard time figuring out what it is trying to do.
Actually I gave up.
My guess is this has a 50-50 chance of doing what you want in
all cases.

So step 1:
Describe the rules for breaking up excel lines in a comment.
step 2:
Now that you have code to do the work, prove it works.
Write several example lines for it to break up.
( My assistant Puck says: "write really devious examples"
By which he means examples that will be hard to break up)

Have it break them up.
Write a test  asserting the the input lines break up into their output components.(Look at other sunit tests for examples).

revising for readability.
A lot is done to keep track of quote state.
I would write a separate method for dealing with the input while in quote state.
As an argument you can pass the input stream and possibly the output stream.
When it returns you would no longer be in quote mode and the streams would be updated.

Once this is done run the tests again.
Do they still work?

revising for speed
Note do NOT work on this first.
Work on speed after you have assured you get the correct results.

What you need to know about
 [temp := temp , char asString]
is that , will copy the string each time.
Building a string up character by character will be Sloooow.
So make temp a writeStream of characters and build it by doing nextPut: .
Retrieve the string by temp contents.

After you do this step. Run the tests again.
Do they still work?

Hth,

Yours in curiosity and service, --Jerome Peace








***





     
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Splitting Excel csv into lines

Stephan Eggermont-3
In reply to this post by Stephan Eggermont-3
Jerome: thank you for  reminding me to write this test-driven.

Stephan
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Splitting Excel csv into lines

Zulq Alam-2
In reply to this post by Stephan Eggermont-3
Hi Stephan,

It's a bit hard to understand. Especially without tests. I would
probably split it into a few methods such that each method tells the
reader a manageable chunk of what is going on. I've sketched a few other
things as well to give you some ideas:

splitIntoLines: aString
   | in |
   in := aString readStream.
   ^ Array streamContents: [:out |
     [in atEnd] whileFalse: [
       out nextPut: (self readLine: in)
     ]
   ]


readLine: aStream
   inQuote := false. "instance variable "
   " Use a stream to build the string instead of #, "
   ^ String streamContents: [:out |
     | char |
     " Give clear indication of what an end of line is "
     [((char := aStream next) = Character cr) and: [inQuote not]]
       whileFalse: [self readNext: aStream onto: out].
     " Use stream methods where you can"
     aStream peekFor: Character lf]


readNext: inStream onto: outStream
   | char |
   char := inStream last. " the loop calls next for us "
   " This used to be ifTrue: [ ifTrue: [ with no ifFalse: "
   (char = $" and: [inQuote]) ifTrue:[
     " Common tasks given to helper methods "
     char := self lookFor: Character tab on: inStream.
     char := self lookFor: Character tab on: inStream
       ifFound: [inQuote := false].
   ].
   char = (Character tab) ifTrue:[
     inQuote ifTrue: [inQuote := false]
       ifFalse: [
         self lookFor: Character tab in: aStream
           ifFound: [inQuote := false].
       ]
   ].
   " BTW, not sure this was right - don't you want CRs
   that aren't part of EOLs? "
   char = (Character cr) ifFalse: [out nextPut: char]


lookFor: aCharacter in: aStream
   ^ self lookFor: aCharacter in: aStream ifFound: []


lookFor: aCharacter in: aStream ifFound: aBlock
   aStream peek = aCharacter ifTrue: [
     aBlock value.
     ^ aStream next.
   ].
   ^ aStream last


Hope this helps,

Zulq.

[hidden email] wrote:

> String lines doesn't handle separating Excel copy-and-paste very well,
> because soft enters in a cell get splitted into separate lines.
>
> Ive done now:
>
> splitIntoLines: aString
>     "Return a collection with the string-lines of the receiver."
>
>     | input char temp inQuote|
>     input := aString readStream.
>     ^ Array streamContents: [ :output |
>         temp := ''.
>         inQuote := false.
>         [ input atEnd ] whileFalse: [
>             char := input next.
>             char = $" ifTrue: [
>                 inQuote ifTrue:[
>                     input peek = (Character tab) ifTrue: [
>                         char := input next.].
>                     input peek = (Character cr) ifTrue: [
>                         char := input next.
>                         inQuote := false]]].
>             char = (Character tab) ifTrue:[
>                 inQuote ifTrue: [ inQuote := false]
>                 ifFalse: [
>                     input peek= $" ifTrue: [
>                         input next.
>                         inQuote:=true]]].
>             char = (Character cr)
>                 ifFalse: [temp := temp, char asString]
>                 ifTrue: [
>                     inQuote ifFalse: [
>                         output nextPut: temp.
>                         temp:=''.
>                         input peek = Character lf ifTrue: [input next]]]]]
>
> I would be interested in (speed & elegance & mistakes) improvements to it.
>
> Stephan

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners