Data scrapping in pharo: Extracting tweets contents

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Data scrapping in pharo: Extracting tweets contents

Offray
Hi,

I'm making a data scrapper from twitter. I know that twitter API is
there, but I would like to make the scrapped data available to anyone,
even if the person has not signed an API agreement. Also I think that
this kind of external data is important for making agile visualization
less self-referential and could bring some interesting examples with the
data is common to the usual "netizen".

I have some advances that you can test easily executing the code at [1]
and I have already scrapped and filled out the data from a twitter
profile page.

[1] http://ws.stfx.eu/E3LD464QI0GR

Now I'm having problems extracting tweets data. If I execute the code at
[2] I can get a list of tweets (first 19) and I can explore inside any
member of the collection, but I can't make sense of the SoupTag data
inside. How can I extract particularly the tweet contents?

[2] http://ws.stfx.eu/JXYM7W7WL1H9

Any help with this will be greatly appreciated.

Cheers,

Offray

Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: Extracting tweets contents

Paul DeBruicker
Is this what you want?


| source anUrl tweet |

anUrl := 'https://twitter.com/offrayLC'.
source := Soup fromString: (ZnEasy get: anUrl ) contents asString.
tweet := (source findAllTagsByClass: 'ProfileTweet-text').
tweet collect:[:ea | ea text].






Offray wrote
Hi,

I'm making a data scrapper from twitter. I know that twitter API is
there, but I would like to make the scrapped data available to anyone,
even if the person has not signed an API agreement. Also I think that
this kind of external data is important for making agile visualization
less self-referential and could bring some interesting examples with the
data is common to the usual "netizen".

I have some advances that you can test easily executing the code at [1]
and I have already scrapped and filled out the data from a twitter
profile page.

[1] http://ws.stfx.eu/E3LD464QI0GR

Now I'm having problems extracting tweets data. If I execute the code at
[2] I can get a list of tweets (first 19) and I can explore inside any
member of the collection, but I can't make sense of the SoupTag data
inside. How can I extract particularly the tweet contents?

[2] http://ws.stfx.eu/JXYM7W7WL1H9

Any help with this will be greatly appreciated.

Cheers,

Offray
Reply | Threaded
Open this post in threaded view
|

Re: Data scrapping in pharo: Extracting tweets contents

Offray
Paul,

Thanks, that's pretty much what I'm looking for!

Your clue raises a new question: by default I get only the last 19
tweets from someone. There is any way to tell ZnClient to load more
data, similar to what you do when you scroll down the twitter page?
Sven, any suggestion here?

Thanks again,

Offray

El 05/04/15 a las 22:14, Paul DeBruicker escribió:

> Is this what you want?
>
>
> | source anUrl tweet |
>
> anUrl := 'https://twitter.com/offrayLC'.
> source := Soup fromString: (ZnEasy get: anUrl ) contents asString.
> tweet := (source findAllTagsByClass: 'ProfileTweet-text').
> tweet collect:[:ea | ea text].
>
>
>
>
>
>
>
> Offray wrote
>> Hi,
>>
>> I'm making a data scrapper from twitter. I know that twitter API is
>> there, but I would like to make the scrapped data available to anyone,
>> even if the person has not signed an API agreement. Also I think that
>> this kind of external data is important for making agile visualization
>> less self-referential and could bring some interesting examples with the
>> data is common to the usual "netizen".
>>
>> I have some advances that you can test easily executing the code at [1]
>> and I have already scrapped and filled out the data from a twitter
>> profile page.
>>
>> [1] http://ws.stfx.eu/E3LD464QI0GR
>>
>> Now I'm having problems extracting tweets data. If I execute the code at
>> [2] I can get a list of tweets (first 19) and I can explore inside any
>> member of the collection, but I can't make sense of the SoupTag data
>> inside. How can I extract particularly the tweet contents?
>>
>> [2] http://ws.stfx.eu/JXYM7W7WL1H9
>>
>> Any help with this will be greatly appreciated.
>>
>> Cheers,
>>
>> Offray
>
>
>
>
>
> --
> View this message in context: http://forum.world.st/Data-scrapping-in-pharo-Extracting-tweets-contents-tp4817746p4817756.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>
>