The Trunk: Collections-mt.838.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-mt.838.mcz

commits-2
Marcel Taeumel uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-mt.838.mcz

==================== Summary ====================

Name: Collections-mt.838
Author: mt
Time: 4 July 2019, 4:32:40.854026 pm
UUID: ac4ab442-79c0-d246-8dec-914be7ee5356
Ancestors: Collections-pre.837

To String, adds simple analysis of natural language in source code. No word stemming.

1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:).
2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:.

Try this:

HTTPDownloadRequest name findFeatures.
(Morph >> #drawOn:) getSource asString findFeatures.

Where can that be useful?

- Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc.
- Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc.

=============== Diff against Collections-pre.837 ===============

Item was added:
+ ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') -----
+ findFeatureIndicesDo: aBlock
+ "State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons"
+ | last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op"  |
+
+ state := 0.
+ last := 1.
+
+ 1 to: self size do: [ :index |
+ char := self at: index.
+ "a"
+ char isLowercase ifTrue: [
+ (state < 3) ifTrue: [state := 1]. "*a -> a"
+ (state == 3) ifTrue: [
+ "AAa -> A + Aa (camel case follows uppercase)"
+ aBlock value: last value: index - 2.
+ last := index - 1.
+ state := 2].
+ (state > 3) ifTrue: [
+ "+a -> + | a (letter follows non-letter)"
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 1]]
+ ifFalse: [
+ char isUppercase ifTrue: [
+ (state == 0)
+ ifTrue: [state := 2] "start -> A"
+ ifFalse: [
+ (state < 2 or: [state > 3]) ifTrue: [
+ "*A -> * | A (uppercase begins, flush before)"
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 2] ifFalse: [
+ "AA -> AA (uppercase continues)"
+ state := 3]]]
+ ifFalse: [
+ ("char == $: or:" char isSeparator) ifTrue: [
+ "skip colon/whitespace"
+ (state > 0) ifTrue: [
+ aBlock value: last value: index - 1.
+ state := 0].
+ last := index + 1]
+ ifFalse: [
+ char isDigit ifTrue: [
+ (state == 0)
+ ifTrue: [state := 4]
+ ifFalse: [
+ (state ~= 4) ifTrue: [
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 4]]]
+ ifFalse: [
+ (state == 0)
+ ifTrue: [state := 5]
+ ifFalse: [
+ (state < 5) ifTrue: [
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 5]]]]]]].
+ last <= self size ifTrue: [
+ aBlock value: last value: self size]!

Item was added:
+ ----- Method: String>>findFeatures (in category 'accessing - features') -----
+ findFeatures
+
+ ^ Array streamContents: [:features |
+ self findFeaturesDo: [:feature | features nextPut: feature]]!

Item was added:
+ ----- Method: String>>findFeaturesDo: (in category 'accessing - features') -----
+ findFeaturesDo: aBlock
+ "Simple analysis for natural language in source code. No support for word stemming."
+
+ self findFeatureIndicesDo: [:start :end |
+ (self at: start) isLetter ifTrue: [
+ aBlock value: (self copyFrom: start to: end) asLowercase]].!

Item was changed:
  ----- Method: String>>findTokens: (in category 'accessing') -----
  findTokens: delimiters
+ "Answer the collection of tokens that result from parsing self."
+
+ ^ OrderedCollection streamContents: [:tokens |
+ self
+ findTokens: delimiters
+ do: [:token | tokens nextPut: token]]!
- "Answer the collection of tokens that result from parsing self.  Return strings between the delimiters.  Any character in the Collection delimiters marks a border.  Several delimiters in a row are considered as just one separation.  Also, allow delimiters to be a single character."
-
- | tokens keyStart keyStop separators |
-
- tokens := OrderedCollection new.
- separators := delimiters isCharacter
- ifTrue: [Array with: delimiters]
- ifFalse: [delimiters].
- keyStop := 1.
- [keyStop <= self size] whileTrue:
- [keyStart := self skipDelimiters: separators startingAt: keyStop.
- keyStop := self findDelimiters: separators startingAt: keyStart.
- keyStart < keyStop
- ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
- ^tokens!

Item was added:
+ ----- Method: String>>findTokens:do: (in category 'accessing') -----
+ findTokens: delimiters do: aBlock
+
+ self
+ findTokens: delimiters
+ indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].!

Item was added:
+ ----- Method: String>>findTokens:indicesDo: (in category 'accessing') -----
+ findTokens: delimiters indicesDo: aBlock
+ "Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border.  Several delimiters in a row are considered as just one separation.  Also, allow delimiters to be a single character. Similar to #lineIndicesDo:."
+
+ | tokens keyStart keyStop separators |
+ separators := delimiters isCharacter
+ ifTrue: [Array with: delimiters]
+ ifFalse: [delimiters].
+ keyStop := 1.
+ [keyStop <= self size] whileTrue: [
+ keyStart := self skipDelimiters: separators startingAt: keyStop.
+ keyStop := self findDelimiters: separators startingAt: keyStart.
+ keyStart < keyStop
+ ifTrue: [aBlock value: keyStart value: keyStop - 1]].!