Hi all
I was curious about the relative distribution of characters in Squeak Code. I sampled the source code[1] and drew a histogram (Attached) Here are my results: - The most frequent (printable) characters are in order etarsoinl: and more detailed, the 90 most frequent characters: etarsoinl:cdfumhpg.ybwSv"=1CT'x][0F)(k2ANPI|M^B4O7D6R3598#EL-,zWVjU;H+q/>*<G@KX${}YQZJ\~?! - This is quit close to actual English: etaonishrlducmwyfgpbvkjxqz - Observations: - The most frequent punctuation is : and . follows quite long after. - Cascading is comparatively rare. We have more blocks and equality/identity comparisons than ; - Blocks are more common than parenthesis and literal arrays - You cannot spell ifTrue or ifFalse with the 20 most common characters - ifTrue: is far more common than ifFalse: - The most frequent uppercase Character is S. I have no conjecture here, tho. - Comparison: - Here's C, sampling the Linux kernel: et_risancodlupfm,);(*0hvgb-E=x>ITRSACkNL.P1O/wD2My"{}UF&3GB4q86HV5:<X#[]+zK7W9Y|%\!jQZ' - under_score_case vs. camelCase is rather obvious. - (not displayed but tab and newline are amog the 6 most frequent characters!) - Punctuation starts much earlier. - The beginning differs a lot, the ending not so much. - 0 is far more important than 1 - : is unimportant - Here's Ruby, sampling Rails: etsaonridl_cupmh.f:,"gb')(=y#vw/kq>ATx0<1R[]@S{}CE|2?-zjDMIPN+BO\F3L5!HU%&4*98GW6;YV7J`X - underscore shows, but not so much as in C. - The : is (like in Smalltalk) more important - Uppercase is more uncommon than in both C and Smalltalk. Have fun! Best regards -Tobias [1]: " Uses the new HistogramMorph " | characterFrequency | CurrentReadOnlySourceFiles cacheDuring: [characterFrequency := ((CompiledMethod allInstances select: [:method | (method allLiterals detectSum: [:lit | lit isCollection ifFalse: [0] ifTrue: [lit size]]) < 1500]) gather: [:method | method getSource reject: [:c |c isSeparator]]) asBag]. (HistogramMorph on: characterFrequency) labelBlock: [:c | c codePoint > 32 ifTrue:[c asString] ifFalse: [c printString]]; openInWorld. ((characterFrequency sortedCounts collect: [:ea | ea value]) first: 90) join. |
Cool Best, Karl On Wed, Jun 22, 2016 at 10:40 AM, Tobias Pape <[hidden email]> wrote:
|
In reply to this post by Tobias Pape
:) "Do you think the author might be interested in rewriting his work to cut it down? If you cut out all the 'O's, you might lose six pages there." http://www.dailymotion.com/video/x4n10h_mr-mann-bookshop_fun Best, Marcel |
Free forum by Nabble | Edit this page |