Some interesting datasets | https://numeracy.co

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Some interesting datasets | https://numeracy.co

Sven Van Caekenberghe-2
Hi,

I know a lot of people are interested in (public) datasets (to use as examples). The website https://numeracy.co contains a (small) number of interesting ones. Here is how to access them using NeoCSV.

(NeoCSVReader on:
 'https://numeracy.co/standard-library/us-population/states.csv' asUrl retrieveContents readStream) upToEnd.

or

ZnClient new
 url: 'https://numeracy.co/standard-library/us-population/states.csv';
 contentReader: [ :entity | (NeoCSVReader on: entity readStream) upToEnd ];
 get.

Of course, they don't use UTF-8 and don't advertise it, so on some datasets you need to do something extra.

(NeoCSVReader on:
  (ZnDefaultCharacterEncoder
     value: ZnCharacterEncoder latin1
     during: [ 'https://numeracy.co/standard-library/us-population/cities.csv' asUrl retrieveContents ])
     readStream) upToEnd.

(Warning: this last example is quite large, 500K records).

Sven


Reply | Threaded
Open this post in threaded view
|

Re: Some interesting datasets | https://numeracy.co

sirwart
Hey Sven,

I put together that dataset and the latin-1 encoding was an oversight. Sorry!

I fixed it so it's UTF-8 like it should have been from the start. You might have to remove the latin-1 decoder from the cities.csv snippet to decode it properly.

Brian
Reply | Threaded
Open this post in threaded view
|

Re: Some interesting datasets | https://numeracy.co

Sven Van Caekenberghe-2
Very good. Thank you, Brian.

> On 10 Aug 2016, at 00:59, sirwart <[hidden email]> wrote:
>
> Hey Sven,
>
> I put together that dataset and the latin-1 encoding was an oversight.
> Sorry!
>
> I fixed it so it's UTF-8 like it should have been from the start. You might
> have to remove the latin-1 decoder from the cities.csv snippet to decode it
> properly.
>
> Brian
>
>
>
> --
> View this message in context: http://forum.world.st/Some-interesting-datasets-https-numeracy-co-tp4910223p4910266.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>