Tagcloud around avatar part 2: Advances, questions and suggestions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Tagcloud around avatar part 2: Advances, questions and suggestions

Offray
Hi,

Following the advice of Peter Uhnák on tag clouds and avatars I made
some progress on my intended visualization. If you run the code at [1]
you will get something similar to [2] (the difference is that screenshot
is for code inside a grafoscopio document instead of a simple playground).


[1] http://ws.stfx.eu/9G5PEGYFL1MW
[2]
http://mutabit.com/deltas/repos.fossil/datapolis/doc/tip/Figures/personal-tagcloud.png

I will prioritize working on scrapping and cleaning the data, leaving
the position of the avatar to the end (hopefully Alexandre will read
this and in his attempt to make Roassal the best visualization engine in
the universe and its users happier, he will implement my suggestion at
the end).

So in my attempt to clean the data I'm trying to process originalText
(look at [1]) to split it to single words. For that I start copying that
text and replacing any occurrence of punctuation characters and
parenthesis by spaces and then applying #splitOn: ' ' to the new string.
I made this by the chunk of code at [3], but seems inelegant and trying
to use cascades and ending in #yourself didn't make the trick.

=[3]==========================

cookedText1 := originalText.
cookedText1 := cookedText1 copyReplaceAll: ',' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ';' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: '.' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ':' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ')' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: '(' with: ' '.
==============================

So here come my questions:

a) There is any form to replace code at [3] by a more elegant
Smalltalk-ish way so I can have only words no matter if they are
separated by spaces, punctuation marks of starting/ending with parenthesis?

b) Why some uninteresting words like the Spanish 'La' or 'Se' are still
getting their way in the final visualization even if I try to evade them
with the code at [4]

=[4]==========================
(cookedText1 splitOn: ' ') do: [:word |
        ((word size > 1) & (uninterestingWords includes: word asLowercase) not)
  ifTrue: [cookedText2 := cookedText2, word, ' ']].
==============================

And my suggestion:

Please consider making tag clouds with variable layouts and forms.
Python has something similar with [5]

[5] http://sebastianraschka.com/Articles/2014_twitter_wordcloud.html

I will be waiting for your suggestions and thanks for keeping
Pharo/Moose awesome!

Cheers,

Offray

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Tagcloud around avatar part 2: Advances, questions and suggestions

Nicolai Hess
Hi Offray,

2015-03-31 18:17 GMT+02:00 Offray Vladimir Luna Cárdenas <[hidden email]>:
Hi,

Following the advice of Peter Uhnák on tag clouds and avatars I made some progress on my intended visualization. If you run the code at [1] you will get something similar to [2] (the difference is that screenshot is for code inside a grafoscopio document instead of a simple playground).


[1] http://ws.stfx.eu/9G5PEGYFL1MW
[2] http://mutabit.com/deltas/repos.fossil/datapolis/doc/tip/Figures/personal-tagcloud.png

I will prioritize working on scrapping and cleaning the data, leaving the position of the avatar to the end (hopefully Alexandre will read this and in his attempt to make Roassal the best visualization engine in the universe and its users happier, he will implement my suggestion at the end).

So in my attempt to clean the data I'm trying to process originalText (look at [1]) to split it to single words. For that I start copying that text and replacing any occurrence of punctuation characters and parenthesis by spaces and then applying #splitOn: ' ' to the new string. I made this by the chunk of code at [3], but seems inelegant and trying to use cascades and ending in #yourself didn't make the trick.

=[3]==========================

cookedText1 := originalText.
cookedText1 := cookedText1 copyReplaceAll: ',' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ';' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: '.' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ':' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ')' with: ' '.
cookedText1 := cookedText1      copyReplaceAll: '(' with: ' '.
==============================

So here come my questions:

a) There is any form to replace code at [3] by a more elegant Smalltalk-ish way so I can have only words no matter if they are separated by spaces, punctuation marks of starting/ending with parenthesis?

Did you try RxMatcher? Probably much slower, but more flexible.
cookedText1 := (RxMatcher forString:'\w+') matchesIn:originalText.
 

b) Why some uninteresting words like the Spanish 'La' or 'Se' are still getting their way in the final visualization even if I try to evade them with the code at [4]

Because your copyReplace calls only replace punctuations and not the invisible characters like '\n'.
(The RxMatcher result does not include the line break characters, therefore this problem shouldn't occur).
 

=[4]==========================
(cookedText1 splitOn: ' ') do: [:word |
        ((word size > 1) & (uninterestingWords includes: word asLowercase) not)  ifTrue: [cookedText2 := cookedText2, word, ' ']].
==============================

And my suggestion:

Please consider making tag clouds with variable layouts and forms. Python has something similar with [5]

[5] http://sebastianraschka.com/Articles/2014_twitter_wordcloud.html

Yes, that looks great.


nicolai
 

I will be waiting for your suggestions and thanks for keeping Pharo/Moose awesome!

Cheers,

Offray

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Tagcloud around avatar part 2: Advances, questions and suggestions

Nicolai Hess
2015-04-01 13:12 GMT+02:00 Nicolai Hess <[hidden email]>:
Hi Offray,

2015-03-31 18:17 GMT+02:00 Offray Vladimir Luna Cárdenas <[hidden email]>:
Hi,

Following the advice of Peter Uhnák on tag clouds and avatars I made some progress on my intended visualization. If you run the code at [1] you will get something similar to [2] (the difference is that screenshot is for code inside a grafoscopio document instead of a simple playground).


[1] http://ws.stfx.eu/9G5PEGYFL1MW
[2] http://mutabit.com/deltas/repos.fossil/datapolis/doc/tip/Figures/personal-tagcloud.png

I will prioritize working on scrapping and cleaning the data, leaving the position of the avatar to the end (hopefully Alexandre will read this and in his attempt to make Roassal the best visualization engine in the universe and its users happier, he will implement my suggestion at the end).

So in my attempt to clean the data I'm trying to process originalText (look at [1]) to split it to single words. For that I start copying that text and replacing any occurrence of punctuation characters and parenthesis by spaces and then applying #splitOn: ' ' to the new string. I made this by the chunk of code at [3], but seems inelegant and trying to use cascades and ending in #yourself didn't make the trick.

=[3]==========================

cookedText1 := originalText.
cookedText1 := cookedText1 copyReplaceAll: ',' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ';' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: '.' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ':' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ')' with: ' '.
cookedText1 := cookedText1      copyReplaceAll: '(' with: ' '.
==============================

So here come my questions:

a) There is any form to replace code at [3] by a more elegant Smalltalk-ish way so I can have only words no matter if they are separated by spaces, punctuation marks of starting/ending with parenthesis?

Did you try RxMatcher? Probably much slower, but more flexible.
cookedText1 := (RxMatcher forString:'\w+') matchesIn:originalText.

Another way:

cookedText1 := originalText splitOn:[:x| x isLetter not].

and for removing empty and uninteresting words:

cookedText1 := cookedText1 reject:[:k | k size < 2 or:[uninterestingWords includes:k asLowercase]].

and finally create a new space delimited string:

cookedText2 := String streamContents:[:s| cookedText1 asStringOn:s delimiter: String space].


nicolai


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Tagcloud around avatar part 2: Advances, questions and suggestions

Offray
Thanks a lot NIicolai!

This second version works really good and is a lot more readable that
the one involving regular expressions. I will share advances and new
visualizations once I have more interesting data to feed the tag cloud.

Cheers,

Offray

El 02/04/15 a las 04:14, Nicolai Hess escribió:

> 2015-04-01 13:12 GMT+02:00 Nicolai Hess <[hidden email]
> <mailto:[hidden email]>>:
>
>      Hi Offray,
>
>      2015-03-31 18:17 GMT+02:00 Offray Vladimir Luna Cárdenas <[hidden email]
>      <mailto:[hidden email]>>:
>
>          Hi,
>
>          Following the advice of Peter Uhnák on tag clouds and avatars I made
>          some progress on my intended visualization. If you run the code at [1]
>          you will get something similar to [2] (the difference is that screenshot
>          is for code inside a grafoscopio document instead of a simple playground).
>
>
>          [1] http://ws.stfx.eu/9G5PEGYFL1MW
>          [2]
>          http://mutabit.com/deltas/__repos.fossil/datapolis/doc/__tip/Figures/personal-tagcloud.__png
>          <http://mutabit.com/deltas/repos.fossil/datapolis/doc/tip/Figures/personal-tagcloud.png>
>
>          I will prioritize working on scrapping and cleaning the data, leaving
>          the position of the avatar to the end (hopefully Alexandre will read
>          this and in his attempt to make Roassal the best visualization engine in
>          the universe and its users happier, he will implement my suggestion at
>          the end).
>
>          So in my attempt to clean the data I'm trying to process originalText
>          (look at [1]) to split it to single words. For that I start copying that
>          text and replacing any occurrence of punctuation characters and
>          parenthesis by spaces and then applying #splitOn: ' ' to the new string.
>          I made this by the chunk of code at [3], but seems inelegant and trying
>          to use cascades and ending in #yourself didn't make the trick.
>
>          =[3]==========================
>
>          cookedText1 := originalText.
>          cookedText1 := cookedText1 copyReplaceAll: ',' with: ' '.
>          cookedText1 := cookedText1 copyReplaceAll: ';' with: ' '.
>          cookedText1 := cookedText1 copyReplaceAll: '.' with: ' '.
>          cookedText1 := cookedText1 copyReplaceAll: ':' with: ' '.
>          cookedText1 := cookedText1 copyReplaceAll: ')' with: ' '.
>          cookedText1 := cookedText1      copyReplaceAll: '(' with: ' '.
>          ==============================
>
>          So here come my questions:
>
>          a) There is any form to replace code at [3] by a more elegant
>          Smalltalk-ish way so I can have only words no matter if they are
>          separated by spaces, punctuation marks of starting/ending with parenthesis?
>
>
>      Did you try RxMatcher? Probably much slower, but more flexible.
>      cookedText1 := (RxMatcher forString:'\w+') matchesIn:originalText.
>
>
> Another way:
>
> cookedText1 := originalText splitOn:[:x| x isLetter not].
>
> and for removing empty and uninteresting words:
>
> cookedText1 := cookedText1 reject:[:k | k size < 2 or:[uninterestingWords
> includes:k asLowercase]].
>
> and finally create a new space delimited string:
>
> cookedText2 := String streamContents:[:s| cookedText1 asStringOn:s delimiter:
> String space].
>
>
> nicolai
>
>
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>


_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Reply | Threaded
Open this post in threaded view
|

Re: Tagcloud around avatar part 2: Advances, questions and suggestions

Peter Uhnak
In reply to this post by Offray
Please consider making tag clouds with variable layouts and forms. Python has something similar with [5]
The problem is that there is right now no layout specifically for tag clouds in Roassal, it uses quite generic algorithm (RTRectanglePackLayout).

If anyone is interested in this topic, today I also stumbled upon this http://static.mrfeinberg.com/bv_ch03.pdf

[5] http://sebastianraschka.com/Articles/2014_twitter_wordcloud.html

Peter 

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev