Quantcast

[ANN] NeoCSV

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[ANN] NeoCSV

Sven Van Caekenberghe
CSV (Comma Separated Values) and more generally other delimiter-separated-value formats like TSV are probably the most common data exchange format. A number of implementations of this simple format already exist.

NeoCSV is a more flexible and more efficient reader and writer for this format.

You can find it in

        http://mc.stfx.eu/Neo

ConfigurationOfNeoCSV is also available from

        http://www.squeaksource.com/MetacelloRepository
        http://squeaksource.com/MetaRepoForPharo14
        http://ss3.gemstone.com/ss/MetaRepoForPharo20

Documentation can be found here

        https://github.com/svenvc/docs/blob/master/neo/neo-csv-paper.md

Feedback and users are welcome.


Sven


--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [ANN] NeoCSV

Tudor Girba-2
This is nice. Thanks!

Btw, what is the license? :)

Doru


On 22 Jun 2012, at 21:01, Sven Van Caekenberghe wrote:

> CSV (Comma Separated Values) and more generally other delimiter-separated-value formats like TSV are probably the most common data exchange format. A number of implementations of this simple format already exist.
>
> NeoCSV is a more flexible and more efficient reader and writer for this format.
>
> You can find it in
>
> http://mc.stfx.eu/Neo
>
> ConfigurationOfNeoCSV is also available from
>
> http://www.squeaksource.com/MetacelloRepository
> http://squeaksource.com/MetaRepoForPharo14
> http://ss3.gemstone.com/ss/MetaRepoForPharo20
>
> Documentation can be found here
>
> https://github.com/svenvc/docs/blob/master/neo/neo-csv-paper.md
>
> Feedback and users are welcome.
>
>
> Sven
>
>
> --
> Sven Van Caekenberghe
> http://stfx.eu
> Smalltalk is the Red Pill
>

--
www.tudorgirba.com

"What we can governs what we wish."




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [ANN] NeoCSV

Sven Van Caekenberghe
MIT, like all my public stuff, I should add a note somewhere I guess.

On 24 Jun 2012, at 10:31, Tudor Girba wrote:

> This is nice. Thanks!
>
> Btw, what is the license? :)
>
> Doru
>
>
> On 22 Jun 2012, at 21:01, Sven Van Caekenberghe wrote:
>
>> CSV (Comma Separated Values) and more generally other delimiter-separated-value formats like TSV are probably the most common data exchange format. A number of implementations of this simple format already exist.
>>
>> NeoCSV is a more flexible and more efficient reader and writer for this format.
>>
>> You can find it in
>>
>> http://mc.stfx.eu/Neo
>>
>> ConfigurationOfNeoCSV is also available from
>>
>> http://www.squeaksource.com/MetacelloRepository
>> http://squeaksource.com/MetaRepoForPharo14
>> http://ss3.gemstone.com/ss/MetaRepoForPharo20
>>
>> Documentation can be found here
>>
>> https://github.com/svenvc/docs/blob/master/neo/neo-csv-paper.md
>>
>> Feedback and users are welcome.
>>
>>
>> Sven
>>
>>
>> --
>> Sven Van Caekenberghe
>> http://stfx.eu
>> Smalltalk is the Red Pill
>>
>
> --
> www.tudorgirba.com
>
> "What we can governs what we wish."
>
>
>
>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [ANN] NeoCSV

Tudor Girba-2
I was just teasing :). But, indeed, a note would be useful.

Cheers,
Doru



On Sun, Jun 24, 2012 at 12:28 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> MIT, like all my public stuff, I should add a note somewhere I guess.
>
> On 24 Jun 2012, at 10:31, Tudor Girba wrote:
>
>> This is nice. Thanks!
>>
>> Btw, what is the license? :)
>>
>> Doru
>>
>>
>> On 22 Jun 2012, at 21:01, Sven Van Caekenberghe wrote:
>>
>>> CSV (Comma Separated Values) and more generally other delimiter-separated-value formats like TSV are probably the most common data exchange format. A number of implementations of this simple format already exist.
>>>
>>> NeoCSV is a more flexible and more efficient reader and writer for this format.
>>>
>>> You can find it in
>>>
>>>      http://mc.stfx.eu/Neo
>>>
>>> ConfigurationOfNeoCSV is also available from
>>>
>>>      http://www.squeaksource.com/MetacelloRepository
>>>      http://squeaksource.com/MetaRepoForPharo14
>>>      http://ss3.gemstone.com/ss/MetaRepoForPharo20
>>>
>>> Documentation can be found here
>>>
>>>      https://github.com/svenvc/docs/blob/master/neo/neo-csv-paper.md
>>>
>>> Feedback and users are welcome.
>>>
>>>
>>> Sven
>>>
>>>
>>> --
>>> Sven Van Caekenberghe
>>> http://stfx.eu
>>> Smalltalk is the Red Pill
>>>
>>
>> --
>> www.tudorgirba.com
>>
>> "What we can governs what we wish."
>>
>>
>>
>>
>
>



--
www.tudorgirba.com

"Every thing has its own flow"

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [ANN] NeoCSV

Sven Van Caekenberghe
Doru,

On 26 Jun 2012, at 14:21, Tudor Girba wrote:

> I was just teasing :).

Now that I met you in real life, I really can't imagine you would do something like that, teasing people just for the fun of it ;-)

Actually, just yesterday I was further optimizing NeoCSV on an actual example, reading a 3.3 MB file with 140.000 entries like

16777216,17301503,AU
17367040,17432575,MY
17435136,17435391,AU
17498112,17563647,KR
17563648,17825791,CN
17825792,18087935,KR
18153472,18219007,JP

The old code did very simply this:

readFrom: filename
        "Read from a CSV with field start,stop,code as in 3651886848,3651887103,BE"
        "self readFrom: '/Users/sven/Tmp/geo-ip-country/GeoIPCountry.csv'."
       
        | instance data |
        instance := self new.
        data := OrderedCollection new: 145000.
        FileStream oldFileNamed: filename do: [ :stream |
                [ stream atEnd ] whileFalse: [ | tokens range |
                        tokens := stream nextLine findTokens: ','.
                        range := IPAddressRangeCountry
                                from: tokens first asNumber
                                to: tokens second asNumber
                                country: tokens third asSymbol.
                        data add: range ] ].
        instance data: data asArray.
        ^ instance
       
The new code using NeoCSV is this:

readFrom: filename
        "Read from a CSV with field start,stop,code as in 3651886848,3651887103,BE"
        "self readFrom: '/Users/sven/Tmp/geo-ip-country/GeoIPCountry.csv'."
       
        ^ self new
                data: (FileStream oldFileNamed: filename do: [ :stream |
                                        (NeoCSVReader on: stream)
                                                recordClass: IPAddressRangeCountry;
                                                addIntegerField: #start: ; addIntegerField: #stop: ; addSymbolField: #country: ;
                                                upToEnd ]);
                yourself

The new code is simpler, does more internally, and is faster (3.5 vs 2.5 seconds).

Yes I am happy, and looking for users ;-)

Now I am going back to struggling with Metacello.

Sven

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [ANN] NeoCSV

Sean P. DeNigris
Administrator
Sven Van Caekenberghe wrote
Now I am going back to struggling with Metacello.
Let me know if I can help...

Sean
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [ANN] NeoCSV

Paul DeBruicker
In reply to this post by Sven Van Caekenberghe
Hi Sven.
Just wanted to make sure you knew about this metacello toolbox guide:

http://seandenigris.com/blog/?p=844



On Jun 26, 2012, at 5:45 AM, Sven Van Caekenberghe <[hidden email]> wrote:

> Now I am going back to struggling with Metacello.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [ANN] NeoCSV

Sven Van Caekenberghe
In reply to this post by Sean P. DeNigris

On 26 Jun 2012, at 16:56, Sean P. DeNigris wrote:

> Sven Van Caekenberghe wrote
>>
>> Now I am going back to struggling with Metacello.
>>
>
> Let me know if I can help...
>
> Sean

Thanks, Sean, it is just that I postponed using Metacello for so long, and now I finally forced myself to use it. I am learning slowly because I want to understand what I am doing. So far it goes well. I'll certainly ask for help if I am in trouble.

Sven
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [ANN] NeoCSV

Sven Van Caekenberghe
In reply to this post by Paul DeBruicker

On 26 Jun 2012, at 17:38, Paul DeBruicker wrote:

> Hi Sven.
> Just wanted to make sure you knew about this metacello toolbox guide:
>
> http://seandenigris.com/blog/?p=844

Yes, Paul. I saw that.

I am learning slowly because I want to understand what I am doing,
that is why I am preferring the manual route.

So far it goes well. I'll certainly ask for help if I am in trouble.

Sven

PS: And thanks again for doing the Zn config for so long !

Loading...