Report about the D4D datathon challenge

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Report about the D4D datathon challenge

SergeStinckwich
Dear all,

this is a small report about the RESILIENCE team using Pharo&Roassal
for the D4D datathon challenge: http://www.d4d.orange.com/en/home

The idea was to use mobile network data from Senegal provided by
Orange company in order to solve some development problems.

The amount of data provided by Orange was really huge, something like
66 Gb: we have the number and the durations of calls between each pairs of GSM
antennas in Senegal (1666 different antennas) and also some
information about the mobility of people. You can find more
information about the data here:
http://arxiv.org/abs/1407.4885

We try to extract some data from August 2013, because we know that
some floods occurs at that time in Dakar. Our main objective was to to
be able to visualise mobility and communications patterns with ROASSAL
to see if there is any differences before and after the floods.

We had a lot of problems to solve in a short period of time :-)

At the beginning, we couldn't open huge files with Pharo. Only open a
file that is around 3.4Go is impossible. Pharo says that the file does
not exist.
Thierry tells us that us that one must compile the VM with
-D_FILE_OFFSET_BITS=64. (the Cog VMs are also built with
-D_GNU_SOURCE).
Why this is not done by default ?

After that we decide to split the files in many smaller files and we
try to use a mySQL database and various other tools but still have
many troubles because of the size of the data ... At the end, we have
done some visualisations but only on a tiny portion of the data on one
day (one day is around 100Mb of data). You could find some
visualisations done by our team here:
https://twitter.com/sergestinckwich/status/586178094215606273

Managing huge amount in Pharo and also ROASSAL is still a challenge
and we definitively need more tools in this area.

We had some problems with ROASSAL: bugs with RTEdgesBuilder (Alvaro
can you send a report to Alex about that ?), to much time to process
some viz, ...

At the end, our team receive the honorary price:
https://twitter.com/sergestinckwich/status/585866625745887232

This is a brief description of our project.

=========================================================
Project name: RESILIENCE
Team members: Clémence Douard (ENSCI), Onil Goubier (CIRELA), Alvaro
Peralta (LabU / Chile University), Aurélie Thouron (ENSCI), Serge Stinckwich
(IRD / UMMISCO)

The main concern of this project is to improve resilience after an
environmental crisis in a southern city. During the preparedness phase
before crisis, records of inhabitants habits (like patterns of
mobility or communications) are done in order to characterize the
situation before the crisis. Assessments of the improvements will then
be done by providing suitable visualizations of  the resilience
dynamic. The goal is to help citizens to represent these improvements
and to take advantage of them in their daily lives.

We took as an example the city of Dakar during the flooding that
occurred in August 2013.

The data used are those of the dataset 1 (communication between
antennas) and 2 (user mobility) provided Orange. We use this
information, coupled to OSM, and other data related to floods of
August in Dakar to construct maps and indicators to show resilience.
Visualizations are made with agile visualization platform Roassal:
http://agilevisualization.com/
=========================================================

I would like to thank all the team members and also the people who
help us remotely: Alexandre Bergel (ObjectProfile/Chile University)
and Thierry Goubier (CEA)

I would like to thank also the SIMPLON team and ORANGE for organizing
such a nice event.

Regards,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

Reply | Threaded
Open this post in threaded view
|

Re: Report about the D4D datathon challenge

volkert-2
Great story and great work. Really helpfull to get a feeling about the
current "limits" of
the Pharo Platform ... far away from my current data volume
requirements, but
good to know ...

Thanks for sharing.

BW,
Volkert

Am 09.04.2015 um 16:52 schrieb Serge Stinckwich:

> Dear all,
>
> this is a small report about the RESILIENCE team using Pharo&Roassal
> for the D4D datathon challenge: http://www.d4d.orange.com/en/home
>
> The idea was to use mobile network data from Senegal provided by
> Orange company in order to solve some development problems.
>
> The amount of data provided by Orange was really huge, something like
> 66 Gb: we have the number and the durations of calls between each pairs of GSM
> antennas in Senegal (1666 different antennas) and also some
> information about the mobility of people. You can find more
> information about the data here:
> http://arxiv.org/abs/1407.4885
>
> We try to extract some data from August 2013, because we know that
> some floods occurs at that time in Dakar. Our main objective was to to
> be able to visualise mobility and communications patterns with ROASSAL
> to see if there is any differences before and after the floods.
>
> We had a lot of problems to solve in a short period of time :-)
>
> At the beginning, we couldn't open huge files with Pharo. Only open a
> file that is around 3.4Go is impossible. Pharo says that the file does
> not exist.
> Thierry tells us that us that one must compile the VM with
> -D_FILE_OFFSET_BITS=64. (the Cog VMs are also built with
> -D_GNU_SOURCE).
> Why this is not done by default ?
>
> After that we decide to split the files in many smaller files and we
> try to use a mySQL database and various other tools but still have
> many troubles because of the size of the data ... At the end, we have
> done some visualisations but only on a tiny portion of the data on one
> day (one day is around 100Mb of data). You could find some
> visualisations done by our team here:
> https://twitter.com/sergestinckwich/status/586178094215606273
>
> Managing huge amount in Pharo and also ROASSAL is still a challenge
> and we definitively need more tools in this area.
>
> We had some problems with ROASSAL: bugs with RTEdgesBuilder (Alvaro
> can you send a report to Alex about that ?), to much time to process
> some viz, ...
>
> At the end, our team receive the honorary price:
> https://twitter.com/sergestinckwich/status/585866625745887232
>
> This is a brief description of our project.
>
> =========================================================
> Project name: RESILIENCE
> Team members: Clémence Douard (ENSCI), Onil Goubier (CIRELA), Alvaro
> Peralta (LabU / Chile University), Aurélie Thouron (ENSCI), Serge Stinckwich
> (IRD / UMMISCO)
>
> The main concern of this project is to improve resilience after an
> environmental crisis in a southern city. During the preparedness phase
> before crisis, records of inhabitants habits (like patterns of
> mobility or communications) are done in order to characterize the
> situation before the crisis. Assessments of the improvements will then
> be done by providing suitable visualizations of  the resilience
> dynamic. The goal is to help citizens to represent these improvements
> and to take advantage of them in their daily lives.
>
> We took as an example the city of Dakar during the flooding that
> occurred in August 2013.
>
> The data used are those of the dataset 1 (communication between
> antennas) and 2 (user mobility) provided Orange. We use this
> information, coupled to OSM, and other data related to floods of
> August in Dakar to construct maps and indicators to show resilience.
> Visualizations are made with agile visualization platform Roassal:
> http://agilevisualization.com/
> =========================================================
>
> I would like to thank all the team members and also the people who
> help us remotely: Alexandre Bergel (ObjectProfile/Chile University)
> and Thierry Goubier (CEA)
>
> I would like to thank also the SIMPLON team and ORANGE for organizing
> such a nice event.
>
> Regards,


Reply | Threaded
Open this post in threaded view
|

Re: Report about the D4D datathon challenge

EstebanLM

> On 09 Apr 2015, at 20:10, volkert <[hidden email]> wrote:
>
> Great story and great work. Really helpfull to get a feeling about the current "limits" of
> the Pharo Platform ... far away from my current data volume requirements, but
> good to know ...
>
> Thanks for sharing.
>
> BW,
> Volkert
>
> Am 09.04.2015 um 16:52 schrieb Serge Stinckwich:
>> Dear all,
>>
>> this is a small report about the RESILIENCE team using Pharo&Roassal
>> for the D4D datathon challenge: http://www.d4d.orange.com/en/home
>>
>> The idea was to use mobile network data from Senegal provided by
>> Orange company in order to solve some development problems.
>>
>> The amount of data provided by Orange was really huge, something like
>> 66 Gb: we have the number and the durations of calls between each pairs of GSM
>> antennas in Senegal (1666 different antennas) and also some
>> information about the mobility of people. You can find more
>> information about the data here:
>> http://arxiv.org/abs/1407.4885
>>
>> We try to extract some data from August 2013, because we know that
>> some floods occurs at that time in Dakar. Our main objective was to to
>> be able to visualise mobility and communications patterns with ROASSAL
>> to see if there is any differences before and after the floods.
>>
>> We had a lot of problems to solve in a short period of time :-)
>>
>> At the beginning, we couldn't open huge files with Pharo. Only open a
>> file that is around 3.4Go is impossible. Pharo says that the file does
>> not exist.
>> Thierry tells us that us that one must compile the VM with
>> -D_FILE_OFFSET_BITS=64. (the Cog VMs are also built with
>> -D_GNU_SOURCE).
>> Why this is not done by default ?

no idea… probably because I don’t work in windows then this things escapes to me :)
I added the flags to the builds and hopefully tomorrow there will be capable VMs.

cheers,
Esteban

>>
>> After that we decide to split the files in many smaller files and we
>> try to use a mySQL database and various other tools but still have
>> many troubles because of the size of the data ... At the end, we have
>> done some visualisations but only on a tiny portion of the data on one
>> day (one day is around 100Mb of data). You could find some
>> visualisations done by our team here:
>> https://twitter.com/sergestinckwich/status/586178094215606273
>>
>> Managing huge amount in Pharo and also ROASSAL is still a challenge
>> and we definitively need more tools in this area.
>>
>> We had some problems with ROASSAL: bugs with RTEdgesBuilder (Alvaro
>> can you send a report to Alex about that ?), to much time to process
>> some viz, ...
>>
>> At the end, our team receive the honorary price:
>> https://twitter.com/sergestinckwich/status/585866625745887232
>>
>> This is a brief description of our project.
>>
>> =========================================================
>> Project name: RESILIENCE
>> Team members: Clémence Douard (ENSCI), Onil Goubier (CIRELA), Alvaro
>> Peralta (LabU / Chile University), Aurélie Thouron (ENSCI), Serge Stinckwich
>> (IRD / UMMISCO)
>>
>> The main concern of this project is to improve resilience after an
>> environmental crisis in a southern city. During the preparedness phase
>> before crisis, records of inhabitants habits (like patterns of
>> mobility or communications) are done in order to characterize the
>> situation before the crisis. Assessments of the improvements will then
>> be done by providing suitable visualizations of  the resilience
>> dynamic. The goal is to help citizens to represent these improvements
>> and to take advantage of them in their daily lives.
>>
>> We took as an example the city of Dakar during the flooding that
>> occurred in August 2013.
>>
>> The data used are those of the dataset 1 (communication between
>> antennas) and 2 (user mobility) provided Orange. We use this
>> information, coupled to OSM, and other data related to floods of
>> August in Dakar to construct maps and indicators to show resilience.
>> Visualizations are made with agile visualization platform Roassal:
>> http://agilevisualization.com/
>> =========================================================
>>
>> I would like to thank all the team members and also the people who
>> help us remotely: Alexandre Bergel (ObjectProfile/Chile University)
>> and Thierry Goubier (CEA)
>>
>> I would like to thank also the SIMPLON team and ORANGE for organizing
>> such a nice event.
>>
>> Regards,
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Report about the D4D datathon challenge

stepharo
In reply to this post by volkert-2
Superb

Now what would be great is to continue to story to make sure that Pharo
and roassal can be used for real.
Having a streaming API for files is apparently a first step.

Stef


Le 9/4/15 20:10, volkert a écrit :

> Great story and great work. Really helpfull to get a feeling about the
> current "limits" of
> the Pharo Platform ... far away from my current data volume
> requirements, but
> good to know ...
>
> Thanks for sharing.
>
> BW,
> Volkert
>
> Am 09.04.2015 um 16:52 schrieb Serge Stinckwich:
>> Dear all,
>>
>> this is a small report about the RESILIENCE team using Pharo&Roassal
>> for the D4D datathon challenge: http://www.d4d.orange.com/en/home
>>
>> The idea was to use mobile network data from Senegal provided by
>> Orange company in order to solve some development problems.
>>
>> The amount of data provided by Orange was really huge, something like
>> 66 Gb: we have the number and the durations of calls between each
>> pairs of GSM
>> antennas in Senegal (1666 different antennas) and also some
>> information about the mobility of people. You can find more
>> information about the data here:
>> http://arxiv.org/abs/1407.4885
>>
>> We try to extract some data from August 2013, because we know that
>> some floods occurs at that time in Dakar. Our main objective was to to
>> be able to visualise mobility and communications patterns with ROASSAL
>> to see if there is any differences before and after the floods.
>>
>> We had a lot of problems to solve in a short period of time :-)
>>
>> At the beginning, we couldn't open huge files with Pharo. Only open a
>> file that is around 3.4Go is impossible. Pharo says that the file does
>> not exist.
>> Thierry tells us that us that one must compile the VM with
>> -D_FILE_OFFSET_BITS=64. (the Cog VMs are also built with
>> -D_GNU_SOURCE).
>> Why this is not done by default ?
>>
>> After that we decide to split the files in many smaller files and we
>> try to use a mySQL database and various other tools but still have
>> many troubles because of the size of the data ... At the end, we have
>> done some visualisations but only on a tiny portion of the data on one
>> day (one day is around 100Mb of data). You could find some
>> visualisations done by our team here:
>> https://twitter.com/sergestinckwich/status/586178094215606273
>>
>> Managing huge amount in Pharo and also ROASSAL is still a challenge
>> and we definitively need more tools in this area.
>>
>> We had some problems with ROASSAL: bugs with RTEdgesBuilder (Alvaro
>> can you send a report to Alex about that ?), to much time to process
>> some viz, ...
>>
>> At the end, our team receive the honorary price:
>> https://twitter.com/sergestinckwich/status/585866625745887232
>>
>> This is a brief description of our project.
>>
>> =========================================================
>> Project name: RESILIENCE
>> Team members: Clémence Douard (ENSCI), Onil Goubier (CIRELA), Alvaro
>> Peralta (LabU / Chile University), Aurélie Thouron (ENSCI), Serge
>> Stinckwich
>> (IRD / UMMISCO)
>>
>> The main concern of this project is to improve resilience after an
>> environmental crisis in a southern city. During the preparedness phase
>> before crisis, records of inhabitants habits (like patterns of
>> mobility or communications) are done in order to characterize the
>> situation before the crisis. Assessments of the improvements will then
>> be done by providing suitable visualizations of  the resilience
>> dynamic. The goal is to help citizens to represent these improvements
>> and to take advantage of them in their daily lives.
>>
>> We took as an example the city of Dakar during the flooding that
>> occurred in August 2013.
>>
>> The data used are those of the dataset 1 (communication between
>> antennas) and 2 (user mobility) provided Orange. We use this
>> information, coupled to OSM, and other data related to floods of
>> August in Dakar to construct maps and indicators to show resilience.
>> Visualizations are made with agile visualization platform Roassal:
>> http://agilevisualization.com/
>> =========================================================
>>
>> I would like to thank all the team members and also the people who
>> help us remotely: Alexandre Bergel (ObjectProfile/Chile University)
>> and Thierry Goubier (CEA)
>>
>> I would like to thank also the SIMPLON team and ORANGE for organizing
>> such a nice event.
>>
>> Regards,
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Report about the D4D datathon challenge

Sean P. DeNigris
Administrator
In reply to this post by EstebanLM
EstebanLM wrote
I added the flags to the builds and hopefully tomorrow there will be capable VMs.
Step by step we will get there :)
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Report about the D4D datathon challenge

SergeStinckwich
In reply to this post by EstebanLM
On Thu, Apr 9, 2015 at 8:32 PM, Esteban Lorenzano <[hidden email]> wrote:

>
>> On 09 Apr 2015, at 20:10, volkert <[hidden email]> wrote:
>>
>> Great story and great work. Really helpfull to get a feeling about the current "limits" of
>> the Pharo Platform ... far away from my current data volume requirements, but
>> good to know ...
>>
>> Thanks for sharing.
>>
>> BW,
>> Volkert
>>
>> Am 09.04.2015 um 16:52 schrieb Serge Stinckwich:
>>> Dear all,
>>>
>>> this is a small report about the RESILIENCE team using Pharo&Roassal
>>> for the D4D datathon challenge: http://www.d4d.orange.com/en/home
>>>
>>> The idea was to use mobile network data from Senegal provided by
>>> Orange company in order to solve some development problems.
>>>
>>> The amount of data provided by Orange was really huge, something like
>>> 66 Gb: we have the number and the durations of calls between each pairs of GSM
>>> antennas in Senegal (1666 different antennas) and also some
>>> information about the mobility of people. You can find more
>>> information about the data here:
>>> http://arxiv.org/abs/1407.4885
>>>
>>> We try to extract some data from August 2013, because we know that
>>> some floods occurs at that time in Dakar. Our main objective was to to
>>> be able to visualise mobility and communications patterns with ROASSAL
>>> to see if there is any differences before and after the floods.
>>>
>>> We had a lot of problems to solve in a short period of time :-)
>>>
>>> At the beginning, we couldn't open huge files with Pharo. Only open a
>>> file that is around 3.4Go is impossible. Pharo says that the file does
>>> not exist.
>>> Thierry tells us that us that one must compile the VM with
>>> -D_FILE_OFFSET_BITS=64. (the Cog VMs are also built with
>>> -D_GNU_SOURCE).
>>> Why this is not done by default ?
>
> no idea… probably because I don’t work in windows then this things escapes to me :)

We use a Linux VM.

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

Reply | Threaded
Open this post in threaded view
|

Re: Report about the D4D datathon challenge

Thierry Goubier
In reply to this post by stepharo
Le 09/04/2015 22:37, stepharo a écrit :
> Superb
>
> Now what would be great is to continue to story to make sure that Pharo
> and roassal can be used for real.
> Having a streaming API for files is apparently a first step.

We have such a streaming API. (In the case of the D4D, we had. NeoCSV
has it)...

But if the VM is unable to open the file in the first place :(

What can we do?

Thierry

>
> Stef
>
>
> Le 9/4/15 20:10, volkert a écrit :
>> Great story and great work. Really helpfull to get a feeling about the
>> current "limits" of
>> the Pharo Platform ... far away from my current data volume
>> requirements, but
>> good to know ...
>>
>> Thanks for sharing.
>>
>> BW,
>> Volkert
>>
>> Am 09.04.2015 um 16:52 schrieb Serge Stinckwich:
>>> Dear all,
>>>
>>> this is a small report about the RESILIENCE team using Pharo&Roassal
>>> for the D4D datathon challenge: http://www.d4d.orange.com/en/home
>>>
>>> The idea was to use mobile network data from Senegal provided by
>>> Orange company in order to solve some development problems.
>>>
>>> The amount of data provided by Orange was really huge, something like
>>> 66 Gb: we have the number and the durations of calls between each
>>> pairs of GSM
>>> antennas in Senegal (1666 different antennas) and also some
>>> information about the mobility of people. You can find more
>>> information about the data here:
>>> http://arxiv.org/abs/1407.4885
>>>
>>> We try to extract some data from August 2013, because we know that
>>> some floods occurs at that time in Dakar. Our main objective was to to
>>> be able to visualise mobility and communications patterns with ROASSAL
>>> to see if there is any differences before and after the floods.
>>>
>>> We had a lot of problems to solve in a short period of time :-)
>>>
>>> At the beginning, we couldn't open huge files with Pharo. Only open a
>>> file that is around 3.4Go is impossible. Pharo says that the file does
>>> not exist.
>>> Thierry tells us that us that one must compile the VM with
>>> -D_FILE_OFFSET_BITS=64. (the Cog VMs are also built with
>>> -D_GNU_SOURCE).
>>> Why this is not done by default ?
>>>
>>> After that we decide to split the files in many smaller files and we
>>> try to use a mySQL database and various other tools but still have
>>> many troubles because of the size of the data ... At the end, we have
>>> done some visualisations but only on a tiny portion of the data on one
>>> day (one day is around 100Mb of data). You could find some
>>> visualisations done by our team here:
>>> https://twitter.com/sergestinckwich/status/586178094215606273
>>>
>>> Managing huge amount in Pharo and also ROASSAL is still a challenge
>>> and we definitively need more tools in this area.
>>>
>>> We had some problems with ROASSAL: bugs with RTEdgesBuilder (Alvaro
>>> can you send a report to Alex about that ?), to much time to process
>>> some viz, ...
>>>
>>> At the end, our team receive the honorary price:
>>> https://twitter.com/sergestinckwich/status/585866625745887232
>>>
>>> This is a brief description of our project.
>>>
>>> =========================================================
>>> Project name: RESILIENCE
>>> Team members: Clémence Douard (ENSCI), Onil Goubier (CIRELA), Alvaro
>>> Peralta (LabU / Chile University), Aurélie Thouron (ENSCI), Serge
>>> Stinckwich
>>> (IRD / UMMISCO)
>>>
>>> The main concern of this project is to improve resilience after an
>>> environmental crisis in a southern city. During the preparedness phase
>>> before crisis, records of inhabitants habits (like patterns of
>>> mobility or communications) are done in order to characterize the
>>> situation before the crisis. Assessments of the improvements will then
>>> be done by providing suitable visualizations of  the resilience
>>> dynamic. The goal is to help citizens to represent these improvements
>>> and to take advantage of them in their daily lives.
>>>
>>> We took as an example the city of Dakar during the flooding that
>>> occurred in August 2013.
>>>
>>> The data used are those of the dataset 1 (communication between
>>> antennas) and 2 (user mobility) provided Orange. We use this
>>> information, coupled to OSM, and other data related to floods of
>>> August in Dakar to construct maps and indicators to show resilience.
>>> Visualizations are made with agile visualization platform Roassal:
>>> http://agilevisualization.com/
>>> =========================================================
>>>
>>> I would like to thank all the team members and also the people who
>>> help us remotely: Alexandre Bergel (ObjectProfile/Chile University)
>>> and Thierry Goubier (CEA)
>>>
>>> I would like to thank also the SIMPLON team and ORANGE for organizing
>>> such a nice event.
>>>
>>> Regards,
>>
>>
>>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Report about the D4D datathon challenge

Martin Bähr
Excerpts from Thierry Goubier's message of 2015-04-09 23:04:55 +0200:
> But if the VM is unable to open the file in the first place :(

create a named pipe, let the VM open the pipe and then cat the file into the pipe.

greetings, martin.

--
eKita                   -   the online platform for your entire academic life
--
chief engineer                                                       eKita.co
pike programmer      pike.lysator.liu.se    caudium.net     societyserver.org
secretary                                                      beijinglug.org
mentor                                                           fossasia.org
foresight developer  foresightlinux.org                            realss.com
unix sysadmin
Martin Bähr          working in china        http://societyserver.org/mbaehr/