Panama Papers: a case for reproducible research, data activism and frictionless data (powered by Pharo)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Panama Papers: a case for reproducible research, data activism and frictionless data (powered by Pharo)

Offray Vladimir Luna Cárdenas-2
Hi,

I'm glad to share my recent work with Pharo/Roassal in the form of a
minisite[1] and a detailed blog entry[2] arguing about interactive
environments for increasing understanding and participation in data
phenomena:

[1] http://mutabit.com/repos.fossil/panama-papers/doc/tip/index.html
[2] http://mutabit.com/offray/blog/en/entry/panama-papers-1

Using a relatively simple visualization I advocate for this case. The
bigger issues here were not related with visualization, but with
accuracy/completion of the information. For example the original
RTSVGPath includes only 167 world territories, but Panama Papers
mentions over 210. Improving accuracy lead to hunting a bug and to its
bugfix. So we have a more cleaver reader for SVG in Roassal. I was
fighting for several days with newbie errors (like the one on the
download bar not advancing, despite of the download being made).

I think that this are good exemplars on how Pharo Roassal is a superb
moldable and affordable platform on the issues of data oriented
reproducible research (in journalism and/or activism and others).

Comments and suggestions are welcomed, as always.

Cheers,

Offray

Reply | Threaded
Open this post in threaded view
|

Re: Panama Papers: a case for reproducible research, data activism and frictionless data (powered by Pharo)

Gastón Dall' Oglio
Hi. Looks good :)

Just out of curiosity, what data format you used? csv, sqlite?

I am interested in using Neo4j from pharo (http://smalltalkhub.com/#!/~MasashiUmezawa/Neo4reSt) with a large database, and few days ago and found the ICIJ used Neo4j to relate information. In a few days they will give a webinar:

A question, you can use Neo4reSt to store data and Pharo/Roassal for display on a more or less friendly way? or there is a lot impedance between graph models?



2016-05-20 13:44 GMT-03:00 Offray Vladimir Luna Cárdenas <[hidden email]>:
Hi,

I'm glad to share my recent work with Pharo/Roassal in the form of a minisite[1] and a detailed blog entry[2] arguing about interactive environments for increasing understanding and participation in data phenomena:

[1] http://mutabit.com/repos.fossil/panama-papers/doc/tip/index.html
[2] http://mutabit.com/offray/blog/en/entry/panama-papers-1

Using a relatively simple visualization I advocate for this case. The bigger issues here were not related with visualization, but with accuracy/completion of the information. For example the original RTSVGPath includes only 167 world territories, but Panama Papers mentions over 210. Improving accuracy lead to hunting a bug and to its bugfix. So we have a more cleaver reader for SVG in Roassal. I was fighting for several days with newbie errors (like the one on the download bar not advancing, despite of the download being made).

I think that this are good exemplars on how Pharo Roassal is a superb moldable and affordable platform on the issues of data oriented reproducible research (in journalism and/or activism and others).

Comments and suggestions are welcomed, as always.

Cheers,

Offray


Reply | Threaded
Open this post in threaded view
|

Re: Panama Papers: a case for reproducible research, data activism and frictionless data (powered by Pharo)

Offray Vladimir Luna Cárdenas-2
Thanks Gastón for your interest.

I used csv and imported to sqlite, because that's the way the ICIJ released their info and let me query aggregated information in an easy way. I bridge SQLite with Pharo using UDBC and then the choropleth map was made on Roassal. Details are in the blog post ;-).

My first attempt was trying to load all nodes in (Entities in the offshore leaks database) in Roassal and query/visualize directly from it, but with over 150k nodes the environment started to lag and doesn't was as responsive as I want for exploring the dataset. That's why I switched quickly to sqlite. I think that this keeps the environment agile and covers a pretty good amount of the cases when you  work with tabular data and even some specific graphs could be replicated from the exported CVS files containing the entities and their relationships. My focus was more on accuracy of the visualization, trying to put the rest of the territories in a Roassal map. If you're interested I can put a quick script to run the visualization/notebook in your Moose image.

I have not used Neo4J, but there will be a seminar on how it was used in the Panama Papers next Tuesday:

http://info.neo4j.com/0526-register.html

Cheers,

Offray

On 20/05/16 16:42, Gastón Dall' Oglio wrote:
Hi. Looks good :)

Just out of curiosity, what data format you used? csv, sqlite?

I am interested in using Neo4j from pharo (http://smalltalkhub.com/#!/~MasashiUmezawa/Neo4reSt) with a large database, and few days ago and found the ICIJ used Neo4j to relate information. In a few days they will give a webinar:

A question, you can use Neo4reSt to store data and Pharo/Roassal for display on a more or less friendly way? or there is a lot impedance between graph models?



2016-05-20 13:44 GMT-03:00 Offray Vladimir Luna Cárdenas <[hidden email]>:
Hi,

I'm glad to share my recent work with Pharo/Roassal in the form of a minisite[1] and a detailed blog entry[2] arguing about interactive environments for increasing understanding and participation in data phenomena:

[1] http://mutabit.com/repos.fossil/panama-papers/doc/tip/index.html
[2] http://mutabit.com/offray/blog/en/entry/panama-papers-1

Using a relatively simple visualization I advocate for this case. The bigger issues here were not related with visualization, but with accuracy/completion of the information. For example the original RTSVGPath includes only 167 world territories, but Panama Papers mentions over 210. Improving accuracy lead to hunting a bug and to its bugfix. So we have a more cleaver reader for SVG in Roassal. I was fighting for several days with newbie errors (like the one on the download bar not advancing, despite of the download being made).

I think that this are good exemplars on how Pharo Roassal is a superb moldable and affordable platform on the issues of data oriented reproducible research (in journalism and/or activism and others).

Comments and suggestions are welcomed, as always.

Cheers,

Offray



Reply | Threaded
Open this post in threaded view
|

Re: Panama Papers: a case for reproducible research, data activism and frictionless data (powered by Pharo)

Gastón Dall' Oglio
Thanks for your details answer.

I will review your blog post, I have to learn Roassal too so it's a good way to begin... If you can yes please share these scripts :)

Yes it seem to be the same webinar that I indicated in my previos mail ;)

Regards.


2016-05-21 0:27 GMT-03:00 Offray Vladimir Luna Cárdenas <[hidden email]>:
Thanks Gastón for your interest.

I used csv and imported to sqlite, because that's the way the ICIJ released their info and let me query aggregated information in an easy way. I bridge SQLite with Pharo using UDBC and then the choropleth map was made on Roassal. Details are in the blog post ;-).

My first attempt was trying to load all nodes in (Entities in the offshore leaks database) in Roassal and query/visualize directly from it, but with over 150k nodes the environment started to lag and doesn't was as responsive as I want for exploring the dataset. That's why I switched quickly to sqlite. I think that this keeps the environment agile and covers a pretty good amount of the cases when you  work with tabular data and even some specific graphs could be replicated from the exported CVS files containing the entities and their relationships. My focus was more on accuracy of the visualization, trying to put the rest of the territories in a Roassal map. If you're interested I can put a quick script to run the visualization/notebook in your Moose image.

I have not used Neo4J, but there will be a seminar on how it was used in the Panama Papers next Tuesday:

http://info.neo4j.com/0526-register.html

Cheers,

Offray


On 20/05/16 16:42, Gastón Dall' Oglio wrote:
Hi. Looks good :)

Just out of curiosity, what data format you used? csv, sqlite?

I am interested in using Neo4j from pharo (http://smalltalkhub.com/#!/~MasashiUmezawa/Neo4reSt) with a large database, and few days ago and found the ICIJ used Neo4j to relate information. In a few days they will give a webinar:

A question, you can use Neo4reSt to store data and Pharo/Roassal for display on a more or less friendly way? or there is a lot impedance between graph models?



2016-05-20 13:44 GMT-03:00 Offray Vladimir Luna Cárdenas <[hidden email][hidden email]>:
Hi,

I'm glad to share my recent work with Pharo/Roassal in the form of a minisite[1] and a detailed blog entry[2] arguing about interactive environments for increasing understanding and participation in data phenomena:

[1] http://mutabit.com/repos.fossil/panama-papers/doc/tip/index.html
[2] http://mutabit.com/offray/blog/en/entry/panama-papers-1

Using a relatively simple visualization I advocate for this case. The bigger issues here were not related with visualization, but with accuracy/completion of the information. For example the original RTSVGPath includes only 167 world territories, but Panama Papers mentions over 210. Improving accuracy lead to hunting a bug and to its bugfix. So we have a more cleaver reader for SVG in Roassal. I was fighting for several days with newbie errors (like the one on the download bar not advancing, despite of the download being made).

I think that this are good exemplars on how Pharo Roassal is a superb moldable and affordable platform on the issues of data oriented reproducible research (in journalism and/or activism and others).

Comments and suggestions are welcomed, as always.

Cheers,

Offray




Reply | Threaded
Open this post in threaded view
|

Re: Panama Papers: a case for reproducible research, data activism and frictionless data (powered by Pharo)

Offray Vladimir Luna Cárdenas-2
Gastón,

With this script you should be able to install Grafoscopio a run the Panama Papers example on a fresh Moose 6 image:

===
(ConfigurationOfGrafoscopio project version: #stable) load.
OffshoreLeaksDB choroplethWorldMapQuick
===

If you want to follow the interactive tutorial do:

===
OffshoreLeaksDB OffshoreLeaksDB docDownloadFor: 'intro'.
OffshoreLeaksDB openIntroNotebook
===

Cheers,

Offray

On 23/05/16 07:18, Gastón Dall' Oglio wrote:
Thanks for your details answer.

I will review your blog post, I have to learn Roassal too so it's a good way to begin... If you can yes please share these scripts :)

Yes it seem to be the same webinar that I indicated in my previos mail ;)

Regards.


2016-05-21 0:27 GMT-03:00 Offray Vladimir Luna Cárdenas <[hidden email]>:
Thanks Gastón for your interest.

I used csv and imported to sqlite, because that's the way the ICIJ released their info and let me query aggregated information in an easy way. I bridge SQLite with Pharo using UDBC and then the choropleth map was made on Roassal. Details are in the blog post ;-).

My first attempt was trying to load all nodes in (Entities in the offshore leaks database) in Roassal and query/visualize directly from it, but with over 150k nodes the environment started to lag and doesn't was as responsive as I want for exploring the dataset. That's why I switched quickly to sqlite. I think that this keeps the environment agile and covers a pretty good amount of the cases when you  work with tabular data and even some specific graphs could be replicated from the exported CVS files containing the entities and their relationships. My focus was more on accuracy of the visualization, trying to put the rest of the territories in a Roassal map. If you're interested I can put a quick script to run the visualization/notebook in your Moose image.

I have not used Neo4J, but there will be a seminar on how it was used in the Panama Papers next Tuesday:

http://info.neo4j.com/0526-register.html

Cheers,

Offray


On 20/05/16 16:42, Gastón Dall' Oglio wrote:
Hi. Looks good :)

Just out of curiosity, what data format you used? csv, sqlite?

I am interested in using Neo4j from pharo (http://smalltalkhub.com/#!/~MasashiUmezawa/Neo4reSt) with a large database, and few days ago and found the ICIJ used Neo4j to relate information. In a few days they will give a webinar:

A question, you can use Neo4reSt to store data and Pharo/Roassal for display on a more or less friendly way? or there is a lot impedance between graph models?



2016-05-20 13:44 GMT-03:00 Offray Vladimir Luna Cárdenas <[hidden email]>:
Hi,

I'm glad to share my recent work with Pharo/Roassal in the form of a minisite[1] and a detailed blog entry[2] arguing about interactive environments for increasing understanding and participation in data phenomena:

[1] http://mutabit.com/repos.fossil/panama-papers/doc/tip/index.html
[2] http://mutabit.com/offray/blog/en/entry/panama-papers-1

Using a relatively simple visualization I advocate for this case. The bigger issues here were not related with visualization, but with accuracy/completion of the information. For example the original RTSVGPath includes only 167 world territories, but Panama Papers mentions over 210. Improving accuracy lead to hunting a bug and to its bugfix. So we have a more cleaver reader for SVG in Roassal. I was fighting for several days with newbie errors (like the one on the download bar not advancing, despite of the download being made).

I think that this are good exemplars on how Pharo Roassal is a superb moldable and affordable platform on the issues of data oriented reproducible research (in journalism and/or activism and others).

Comments and suggestions are welcomed, as always.

Cheers,

Offray