Smalltalk - Re: Panama Papers: a case for reproducible research, data activism and frictionless data (powered by Pharo)

Smalltalk › Pharo › Pharo Smalltalk Users

Re: Panama Papers: a case for reproducible research, data activism and frictionless data (powered by Pharo)

Posted by Offray Vladimir Luna Cárdenas-2 on May 21, 2016; 2:52am
URL: https://forum.world.st/Panama-Papers-a-case-for-reproducible-research-data-activism-and-frictionless-data-powered-by-Pharo-tp4896412p4896469.html

Thanks Gastón for your interest.

I used csv and imported to sqlite, because that's the way the ICIJ released their info and let me query aggregated information in an easy way. I bridge SQLite with Pharo using UDBC and then the choropleth map was made on Roassal. Details are in the blog post ;-).

My first attempt was trying to load all nodes in (Entities in the offshore leaks database) in Roassal and query/visualize directly from it, but with over 150k nodes the environment started to lag and doesn't was as responsive as I want for exploring the dataset. That's why I switched quickly to sqlite. I think that this keeps the environment agile and covers a pretty good amount of the cases when you work with tabular data and even some specific graphs could be replicated from the exported CVS files containing the entities and their relationships. My focus was more on accuracy of the visualization, trying to put the rest of the territories in a Roassal map. If you're interested I can put a quick script to run the visualization/notebook in your Moose image.

I have not used Neo4J, but there will be a seminar on how it was used in the Panama Papers next Tuesday:

http://info.neo4j.com/0526-register.html

Cheers,

Offray

On 20/05/16 16:42, Gastón Dall' Oglio wrote:

Hi. Looks good :)

Just out of curiosity, what data format you used? csv, sqlite?

I am interested in using Neo4j from pharo (http://smalltalkhub.com/#!/~MasashiUmezawa/Neo4reSt) with a large database, and few days ago and found the ICIJ used Neo4j to relate information. In a few days they will give a webinar:

http://info.neo4j.com/05262016-ICIJandPanamaPapersOnDemand_Registration.html

A question, you can use Neo4reSt to store data and Pharo/Roassal for display on a more or less friendly way? or there is a lot impedance between graph models?

2016-05-20 13:44 GMT-03:00 Offray Vladimir Luna Cárdenas <[hidden email]>:

Hi,

I'm glad to share my recent work with Pharo/Roassal in the form of a minisite[1] and a detailed blog entry[2] arguing about interactive environments for increasing understanding and participation in data phenomena:

[1] http://mutabit.com/repos.fossil/panama-papers/doc/tip/index.html
[2] http://mutabit.com/offray/blog/en/entry/panama-papers-1

Using a relatively simple visualization I advocate for this case. The bigger issues here were not related with visualization, but with accuracy/completion of the information. For example the original RTSVGPath includes only 167 world territories, but Panama Papers mentions over 210. Improving accuracy lead to hunting a bug and to its bugfix. So we have a more cleaver reader for SVG in Roassal. I was fighting for several days with newbie errors (like the one on the download bar not advancing, despite of the download being made).

I think that this are good exemplars on how Pharo Roassal is a superb moldable and affordable platform on the issues of data oriented reproducible research (in journalism and/or activism and others).

Comments and suggestions are welcomed, as always.

Cheers,

Offray