ZnClient: getting more that 19 tweet for data scrapping

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ZnClient: getting more that 19 tweet for data scrapping

Offray
Hi,

Recently Paul DeBruicker taught me how to refine my code for getting
tweets properly. Consider this:

=[1]====================================
| source anUrl tweet |

anUrl := 'https://twitter.com/offrayLC'.
source := Soup fromString: (ZnEasy get: anUrl ) contents asString.
tweets := (source findAllTagsByClass: 'ProfileTweet-text') collect:[:ea
| ea text].
========================================

Is working fine, but I would like to get more that 19 tweets, that is
what you get by default. There is any way to tell ZnEasy and friends to
get more tweets, something similar to what you do when you scroll down
into a twitter page?

And by the way, I would like to make more sense of the Soup I got in the
last line. ea text gives me the tweet contents, but how can I interpret
the metadata in the soup? (is a retweet, date of publishing and so on).
I could make this for most part of the twitter profile page, but the
tweet is kind of elusive, for example how to know that "text" is the
proper message for getting the tweet content? Any pointer to how to make
sense of it by myself is greatly appreciated.

Cheers,

Offray

Reply | Threaded
Open this post in threaded view
|

Re: ZnClient: getting more that 19 tweet for data scrapping

Sven Van Caekenberghe-2
What Paul showed is basically just a hack.

What you probably what is full API access to Twitter, that gives you the real thing, but it is more work and you have to understand all the technical details (unless somebody already did it for you, I don't know - I know that Zinc-SSO can connect to Twitter).

https://dev.twitter.com/overview/api

> On 07 Apr 2015, at 20:23, Offray Vladimir Luna Cárdenas <[hidden email]> wrote:
>
> Hi,
>
> Recently Paul DeBruicker taught me how to refine my code for getting tweets properly. Consider this:
>
> =[1]====================================
> | source anUrl tweet |
>
> anUrl := 'https://twitter.com/offrayLC'.
> source := Soup fromString: (ZnEasy get: anUrl ) contents asString.
> tweets := (source findAllTagsByClass: 'ProfileTweet-text') collect:[:ea | ea text].
> ========================================
>
> Is working fine, but I would like to get more that 19 tweets, that is what you get by default. There is any way to tell ZnEasy and friends to get more tweets, something similar to what you do when you scroll down into a twitter page?
>
> And by the way, I would like to make more sense of the Soup I got in the last line. ea text gives me the tweet contents, but how can I interpret the metadata in the soup? (is a retweet, date of publishing and so on). I could make this for most part of the twitter profile page, but the tweet is kind of elusive, for example how to know that "text" is the proper message for getting the tweet content? Any pointer to how to make sense of it by myself is greatly appreciated.
>
> Cheers,
>
> Offray
>


Reply | Threaded
Open this post in threaded view
|

Re: ZnClient: getting more that 19 tweet for data scrapping

Paul DeBruicker
Offray - What Sven said is correct.  You're not getting an answer about how to violate their Terms of Service because this isn't that kind of place.  You've asked 3 times.  Once is usually enough.  Use the API. For the Soup questions get an inspector on an instance of a SoupTag and start sending it messages it understands and see what you get. Trial and error.  Or read the python Soup docs as the commands probably have an equivalent in the Smalltalk library.  Most of this programming stuff is reading, doing a little experiment,  thinking, then trying again.  

Sven - I only showed him that SoupTag has a #text message. I'm sure you're busy and had forgotten that the first time he/she asked they stated that they don't want to use the api:  http://forum.world.st/Data-scrapping-in-pharo-Extracting-tweets-contents-td4817746.html and provided the download code in an ws.stfx.eu snippet.  

Hope this helps

Paul

Sven Van Caekenberghe-2 wrote
What Paul showed is basically just a hack.

What you probably what is full API access to Twitter, that gives you the real thing, but it is more work and you have to understand all the technical details (unless somebody already did it for you, I don't know - I know that Zinc-SSO can connect to Twitter).

https://dev.twitter.com/overview/api

> On 07 Apr 2015, at 20:23, Offray Vladimir Luna Cárdenas <[hidden email]> wrote:
>
> Hi,
>
> Recently Paul DeBruicker taught me how to refine my code for getting tweets properly. Consider this:
>
> =[1]====================================
> | source anUrl tweet |
>
> anUrl := 'https://twitter.com/offrayLC'.
> source := Soup fromString: (ZnEasy get: anUrl ) contents asString.
> tweets := (source findAllTagsByClass: 'ProfileTweet-text') collect:[:ea | ea text].
> ========================================
>
> Is working fine, but I would like to get more that 19 tweets, that is what you get by default. There is any way to tell ZnEasy and friends to get more tweets, something similar to what you do when you scroll down into a twitter page?
>
> And by the way, I would like to make more sense of the Soup I got in the last line. ea text gives me the tweet contents, but how can I interpret the metadata in the soup? (is a retweet, date of publishing and so on). I could make this for most part of the twitter profile page, but the tweet is kind of elusive, for example how to know that "text" is the proper message for getting the tweet content? Any pointer to how to make sense of it by myself is greatly appreciated.
>
> Cheers,
>
> Offray
>
Reply | Threaded
Open this post in threaded view
|

Re: ZnClient: getting more that 19 tweet for data scrapping

Sven Van Caekenberghe-2

> On 08 Apr 2015, at 15:29, Paul DeBruicker <[hidden email]> wrote:
>
> Offray - What Sven said is correct.  You're not getting an answer about how
> to violate their Terms of Service because this isn't that kind of place.
> You've asked 3 times.  Once is usually enough.  Use the API. For the Soup
> questions get an inspector on an instance of a SoupTag and start sending it
> messages it understands and see what you get. Trial and error.  Or read the
> python Soup docs as the commands probably have an equivalent in the
> Smalltalk library.  Most of this programming stuff is reading, doing a
> little experiment,  thinking, then trying again.  
>
> Sven - I only showed him that SoupTag has a #text message. I'm sure you're
> busy and had forgotten that the first time he/she asked they stated that
> they don't want to use the api:
> http://forum.world.st/Data-scrapping-in-pharo-Extracting-tweets-contents-td4817746.html
> and provided the download code in an ws.stfx.eu snippet.  

Paul, I know you understand, we're on the same page. Sven

> Hope this helps
>
> Paul
>
>
> Sven Van Caekenberghe-2 wrote
>> What Paul showed is basically just a hack.
>>
>> What you probably what is full API access to Twitter, that gives you the
>> real thing, but it is more work and you have to understand all the
>> technical details (unless somebody already did it for you, I don't know -
>> I know that Zinc-SSO can connect to Twitter).
>>
>> https://dev.twitter.com/overview/api
>>
>>> On 07 Apr 2015, at 20:23, Offray Vladimir Luna Cárdenas &lt;
>
>> offray@
>
>> &gt; wrote:
>>>
>>> Hi,
>>>
>>> Recently Paul DeBruicker taught me how to refine my code for getting
>>> tweets properly. Consider this:
>>>
>>> =[1]====================================
>>> | source anUrl tweet |
>>>
>>> anUrl := 'https://twitter.com/offrayLC'.
>>> source := Soup fromString: (ZnEasy get: anUrl ) contents asString.
>>> tweets := (source findAllTagsByClass: 'ProfileTweet-text') collect:[:ea |
>>> ea text].
>>> ========================================
>>>
>>> Is working fine, but I would like to get more that 19 tweets, that is
>>> what you get by default. There is any way to tell ZnEasy and friends to
>>> get more tweets, something similar to what you do when you scroll down
>>> into a twitter page?
>>>
>>> And by the way, I would like to make more sense of the Soup I got in the
>>> last line. ea text gives me the tweet contents, but how can I interpret
>>> the metadata in the soup? (is a retweet, date of publishing and so on). I
>>> could make this for most part of the twitter profile page, but the tweet
>>> is kind of elusive, for example how to know that "text" is the proper
>>> message for getting the tweet content? Any pointer to how to make sense
>>> of it by myself is greatly appreciated.
>>>
>>> Cheers,
>>>
>>> Offray
>>>
>
>
>
>
>
> --
> View this message in context: http://forum.world.st/ZnClient-getting-more-that-19-tweet-for-data-scrapping-tp4818162p4818361.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: ZnClient: getting more that 19 tweet for data scrapping

Offray
In reply to this post by Paul DeBruicker
Hi Paul and Sven,

I will try Twitter API if is necessary, but I'm not trying to get
support here on how to violate Twitter ToS. I'm pretty aware of them,
but surely there are exceptions. The mail I shared with you (the one
Paul point to) about why I would like to not use API but instead use
scrapping doesn't go in details, it just said that it was because people
who have not a twitter account or signed the ToS should be able to get
some twitter info. That special kind of info is the one regarding public
political/politicians discourse and my idea to scrap  *public* and
*specific* data from twitter happens in the context of a project for
citizen oversight of political issues empowered by ICT. The project is
discussed on detail here:

https://www.newschallenge.org/challenge/elections/entries/datapolis-data-narratives-visualizations-for-citizen-oversight-of-politicians-discourses-and-public-contracts-in-social-media-and-the-web

So, for the moment, I will get properly account permissions for getting
twitter data, but my conviction for the long term is that public
political discourse (among others), even the one that circulates on
private networks like Twitter or Facebook, should be under
Constitutional Terms (like free speech and wide political participation)
and not under the restricted ones of Twitter or Facebook.

This is a sensible issue and surely needs more talk, may be on another
time, but I will follow your advice and extract data from Twitter API
and come here with questions about it.

Thanks for all your help and support,

Offray

El 08/04/15 a las 08:29, Paul DeBruicker escribió:

> Offray - What Sven said is correct.  You're not getting an answer about how
> to violate their Terms of Service because this isn't that kind of place.
> You've asked 3 times.  Once is usually enough.  Use the API. For the Soup
> questions get an inspector on an instance of a SoupTag and start sending it
> messages it understands and see what you get. Trial and error.  Or read the
> python Soup docs as the commands probably have an equivalent in the
> Smalltalk library.  Most of this programming stuff is reading, doing a
> little experiment,  thinking, then trying again.
>
> Sven - I only showed him that SoupTag has a #text message. I'm sure you're
> busy and had forgotten that the first time he/she asked they stated that
> they don't want to use the api:
> http://forum.world.st/Data-scrapping-in-pharo-Extracting-tweets-contents-td4817746.html
> and provided the download code in an ws.stfx.eu snippet.
>
> Hope this helps
>
> Paul
>
>
> Sven Van Caekenberghe-2 wrote
>> What Paul showed is basically just a hack.
>>
>> What you probably what is full API access to Twitter, that gives you the
>> real thing, but it is more work and you have to understand all the
>> technical details (unless somebody already did it for you, I don't know -
>> I know that Zinc-SSO can connect to Twitter).
>>
>> https://dev.twitter.com/overview/api
>>
>>> On 07 Apr 2015, at 20:23, Offray Vladimir Luna Cárdenas &lt;
>
>> offray@
>
>> &gt; wrote:
>>>
>>> Hi,
>>>
>>> Recently Paul DeBruicker taught me how to refine my code for getting
>>> tweets properly. Consider this:
>>>
>>> =[1]====================================
>>> | source anUrl tweet |
>>>
>>> anUrl := 'https://twitter.com/offrayLC'.
>>> source := Soup fromString: (ZnEasy get: anUrl ) contents asString.
>>> tweets := (source findAllTagsByClass: 'ProfileTweet-text') collect:[:ea |
>>> ea text].
>>> ========================================
>>>
>>> Is working fine, but I would like to get more that 19 tweets, that is
>>> what you get by default. There is any way to tell ZnEasy and friends to
>>> get more tweets, something similar to what you do when you scroll down
>>> into a twitter page?
>>>
>>> And by the way, I would like to make more sense of the Soup I got in the
>>> last line. ea text gives me the tweet contents, but how can I interpret
>>> the metadata in the soup? (is a retweet, date of publishing and so on). I
>>> could make this for most part of the twitter profile page, but the tweet
>>> is kind of elusive, for example how to know that "text" is the proper
>>> message for getting the tweet content? Any pointer to how to make sense
>>> of it by myself is greatly appreciated.
>>>
>>> Cheers,
>>>
>>> Offray
>>>
>
>
>
>
>
> --
> View this message in context: http://forum.world.st/ZnClient-getting-more-that-19-tweet-for-data-scrapping-tp4818162p4818361.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>
>