Marcel Taeumel uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-mt.838.mcz ==================== Summary ==================== Name: Collections-mt.838 Author: mt Time: 4 July 2019, 4:32:40.854026 pm UUID: ac4ab442-79c0-d246-8dec-914be7ee5356 Ancestors: Collections-pre.837 To String, adds simple analysis of natural language in source code. No word stemming. 1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:). 2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:. Try this: HTTPDownloadRequest name findFeatures. (Morph >> #drawOn:) getSource asString findFeatures. Where can that be useful? - Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc. - Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc. =============== Diff against Collections-pre.837 =============== Item was added: + ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') ----- + findFeatureIndicesDo: aBlock + "State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons" + | last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op" | + + state := 0. + last := 1. + + 1 to: self size do: [ :index | + char := self at: index. + "a" + char isLowercase ifTrue: [ + (state < 3) ifTrue: [state := 1]. "*a -> a" + (state == 3) ifTrue: [ + "AAa -> A + Aa (camel case follows uppercase)" + aBlock value: last value: index - 2. + last := index - 1. + state := 2]. + (state > 3) ifTrue: [ + "+a -> + | a (letter follows non-letter)" + aBlock value: last value: index - 1. + last := index. + state := 1]] + ifFalse: [ + char isUppercase ifTrue: [ + (state == 0) + ifTrue: [state := 2] "start -> A" + ifFalse: [ + (state < 2 or: [state > 3]) ifTrue: [ + "*A -> * | A (uppercase begins, flush before)" + aBlock value: last value: index - 1. + last := index. + state := 2] ifFalse: [ + "AA -> AA (uppercase continues)" + state := 3]]] + ifFalse: [ + ("char == $: or:" char isSeparator) ifTrue: [ + "skip colon/whitespace" + (state > 0) ifTrue: [ + aBlock value: last value: index - 1. + state := 0]. + last := index + 1] + ifFalse: [ + char isDigit ifTrue: [ + (state == 0) + ifTrue: [state := 4] + ifFalse: [ + (state ~= 4) ifTrue: [ + aBlock value: last value: index - 1. + last := index. + state := 4]]] + ifFalse: [ + (state == 0) + ifTrue: [state := 5] + ifFalse: [ + (state < 5) ifTrue: [ + aBlock value: last value: index - 1. + last := index. + state := 5]]]]]]]. + last <= self size ifTrue: [ + aBlock value: last value: self size]! Item was added: + ----- Method: String>>findFeatures (in category 'accessing - features') ----- + findFeatures + + ^ Array streamContents: [:features | + self findFeaturesDo: [:feature | features nextPut: feature]]! Item was added: + ----- Method: String>>findFeaturesDo: (in category 'accessing - features') ----- + findFeaturesDo: aBlock + "Simple analysis for natural language in source code. No support for word stemming." + + self findFeatureIndicesDo: [:start :end | + (self at: start) isLetter ifTrue: [ + aBlock value: (self copyFrom: start to: end) asLowercase]].! Item was changed: ----- Method: String>>findTokens: (in category 'accessing') ----- findTokens: delimiters + "Answer the collection of tokens that result from parsing self." + + ^ OrderedCollection streamContents: [:tokens | + self + findTokens: delimiters + do: [:token | tokens nextPut: token]]! - "Answer the collection of tokens that result from parsing self. Return strings between the delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character." - - | tokens keyStart keyStop separators | - - tokens := OrderedCollection new. - separators := delimiters isCharacter - ifTrue: [Array with: delimiters] - ifFalse: [delimiters]. - keyStop := 1. - [keyStop <= self size] whileTrue: - [keyStart := self skipDelimiters: separators startingAt: keyStop. - keyStop := self findDelimiters: separators startingAt: keyStart. - keyStart < keyStop - ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]]. - ^tokens! Item was added: + ----- Method: String>>findTokens:do: (in category 'accessing') ----- + findTokens: delimiters do: aBlock + + self + findTokens: delimiters + indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].! Item was added: + ----- Method: String>>findTokens:indicesDo: (in category 'accessing') ----- + findTokens: delimiters indicesDo: aBlock + "Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character. Similar to #lineIndicesDo:." + + | tokens keyStart keyStop separators | + separators := delimiters isCharacter + ifTrue: [Array with: delimiters] + ifFalse: [delimiters]. + keyStop := 1. + [keyStop <= self size] whileTrue: [ + keyStart := self skipDelimiters: separators startingAt: keyStop. + keyStop := self findDelimiters: separators startingAt: keyStart. + keyStart < keyStop + ifTrue: [aBlock value: keyStart value: keyStop - 1]].! |
Free forum by Nabble | Edit this page |