Dear all,
We are currently setup a small ROASSAL team to participate to #Datathon Data for Development: http://simplon.co/datathon-data-for-development-rdv-les-7-et-8-avril-a-montreuil/ We are looking to ways to be able to load big CSV table in a Pharo image. Apparently the size of some CSV files provided will be huge (around 5 Go for one month of data). The format of the data are describe here: http://arxiv.org/abs/1407.4885 Is this possible with NeoCSV, to read only a fraction of the lines regarding some conditions ? If some people want to help online, we can organize a chat to organize us. Regards, -- Serge Stinckwich UCBN & UMI UMMISCO 209 (IRD/UPMC) Every DSL ends up being Smalltalk http://www.doesnotunderstand.org/ |
> Am 04.04.2015 um 19:23 schrieb Serge Stinckwich <[hidden email]>: > > Dear all, > We are currently setup a small ROASSAL team to participate to > #Datathon Data for Development: > http://simplon.co/datathon-data-for-development-rdv-les-7-et-8-avril-a-montreuil/ > > We are looking to ways to be able to load big CSV table in a Pharo image. > Apparently the size of some CSV files provided will be huge (around 5 > Go for one month of data). The format of the data are describe here: > http://arxiv.org/abs/1407.4885 > > Is this possible with NeoCSV, to read only a fraction of the lines > regarding some conditions ? > Norbert > If some people want to help online, we can organize a chat to organize us. > Regards, > -- > Serge Stinckwich > UCBN & UMI UMMISCO 209 (IRD/UPMC) > Every DSL ends up being Smalltalk > http://www.doesnotunderstand.org/ > |
There are also #select: and #select:thenDo: convenience methods.
NeoCSV is properly streaming, it should not introduce memory consumption problems itself. But note that you cannot load more than about 1Gb of permanent data in the current VM. One known performance limitation is in handling extremely long lines/records. If you have a question or problem, just ask. Sven > On 04 Apr 2015, at 19:54, Norbert Hartl <[hidden email]> wrote: > >> >> Am 04.04.2015 um 19:23 schrieb Serge Stinckwich <[hidden email]>: >> >> Dear all, >> We are currently setup a small ROASSAL team to participate to >> #Datathon Data for Development: >> http://simplon.co/datathon-data-for-development-rdv-les-7-et-8-avril-a-montreuil/ >> >> We are looking to ways to be able to load big CSV table in a Pharo image. >> Apparently the size of some CSV files provided will be huge (around 5 >> Go for one month of data). The format of the data are describe here: >> http://arxiv.org/abs/1407.4885 >> >> Is this possible with NeoCSV, to read only a fraction of the lines >> regarding some conditions ? >> > The NeoCSVReader supports the necessary stream protocol. If you setup the csv reader you can call #next on it and filter by condition. There is also #atEnd so a simple loop should. But I never used to csv reader so Sven might have much better options. > > Norbert > >> If some people want to help online, we can organize a chat to organize us. >> Regards, >> -- >> Serge Stinckwich >> UCBN & UMI UMMISCO 209 (IRD/UPMC) >> Every DSL ends up being Smalltalk >> http://www.doesnotunderstand.org/ |
Thanks Sven for your support!
Alexandre > On Apr 4, 2015, at 3:02 PM, Sven Van Caekenberghe <[hidden email]> wrote: > > There are also #select: and #select:thenDo: convenience methods. > > NeoCSV is properly streaming, it should not introduce memory consumption problems itself. But note that you cannot load more than about 1Gb of permanent data in the current VM. > > One known performance limitation is in handling extremely long lines/records. > > If you have a question or problem, just ask. > > Sven > >> On 04 Apr 2015, at 19:54, Norbert Hartl <[hidden email]> wrote: >> >>> >>> Am 04.04.2015 um 19:23 schrieb Serge Stinckwich <[hidden email]>: >>> >>> Dear all, >>> We are currently setup a small ROASSAL team to participate to >>> #Datathon Data for Development: >>> http://simplon.co/datathon-data-for-development-rdv-les-7-et-8-avril-a-montreuil/ >>> >>> We are looking to ways to be able to load big CSV table in a Pharo image. >>> Apparently the size of some CSV files provided will be huge (around 5 >>> Go for one month of data). The format of the data are describe here: >>> http://arxiv.org/abs/1407.4885 >>> >>> Is this possible with NeoCSV, to read only a fraction of the lines >>> regarding some conditions ? >>> >> The NeoCSVReader supports the necessary stream protocol. If you setup the csv reader you can call #next on it and filter by condition. There is also #atEnd so a simple loop should. But I never used to csv reader so Sven might have much better options. >> >> Norbert >> >>> If some people want to help online, we can organize a chat to organize us. >>> Regards, >>> -- >>> Serge Stinckwich >>> UCBN & UMI UMMISCO 209 (IRD/UPMC) >>> Every DSL ends up being Smalltalk >>> http://www.doesnotunderstand.org/ > > -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. |
Free forum by Nabble | Edit this page |