The Inbox: Collections-mt.838.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Collections-mt.838.mcz

commits-2
A new version of Collections was added to project The Inbox:
http://source.squeak.org/inbox/Collections-mt.838.mcz

==================== Summary ====================

Name: Collections-mt.838
Author: mt
Time: 4 July 2019, 4:32:40.854026 pm
UUID: ac4ab442-79c0-d246-8dec-914be7ee5356
Ancestors: Collections-pre.837

To String, adds simple analysis of natural language in source code. No word stemming.

1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:).
2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:.

Try this:

HTTPDownloadRequest name findFeatures.
(Morph >> #drawOn:) getSource asString findFeatures.

Where can that be useful?

- Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc.
- Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc.

=============== Diff against Collections-pre.837 ===============

Item was added:
+ ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') -----
+ findFeatureIndicesDo: aBlock
+ "State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons"
+ | last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op"  |
+
+ state := 0.
+ last := 1.
+
+ 1 to: self size do: [ :index |
+ char := self at: index.
+ "a"
+ char isLowercase ifTrue: [
+ (state < 3) ifTrue: [state := 1]. "*a -> a"
+ (state == 3) ifTrue: [
+ "AAa -> A + Aa (camel case follows uppercase)"
+ aBlock value: last value: index - 2.
+ last := index - 1.
+ state := 2].
+ (state > 3) ifTrue: [
+ "+a -> + | a (letter follows non-letter)"
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 1]]
+ ifFalse: [
+ char isUppercase ifTrue: [
+ (state == 0)
+ ifTrue: [state := 2] "start -> A"
+ ifFalse: [
+ (state < 2 or: [state > 3]) ifTrue: [
+ "*A -> * | A (uppercase begins, flush before)"
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 2] ifFalse: [
+ "AA -> AA (uppercase continues)"
+ state := 3]]]
+ ifFalse: [
+ ("char == $: or:" char isSeparator) ifTrue: [
+ "skip colon/whitespace"
+ (state > 0) ifTrue: [
+ aBlock value: last value: index - 1.
+ state := 0].
+ last := index + 1]
+ ifFalse: [
+ char isDigit ifTrue: [
+ (state == 0)
+ ifTrue: [state := 4]
+ ifFalse: [
+ (state ~= 4) ifTrue: [
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 4]]]
+ ifFalse: [
+ (state == 0)
+ ifTrue: [state := 5]
+ ifFalse: [
+ (state < 5) ifTrue: [
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 5]]]]]]].
+ last <= self size ifTrue: [
+ aBlock value: last value: self size]!

Item was added:
+ ----- Method: String>>findFeatures (in category 'accessing - features') -----
+ findFeatures
+
+ ^ Array streamContents: [:features |
+ self findFeaturesDo: [:feature | features nextPut: feature]]!

Item was added:
+ ----- Method: String>>findFeaturesDo: (in category 'accessing - features') -----
+ findFeaturesDo: aBlock
+ "Simple analysis for natural language in source code. No support for word stemming."
+
+ self findFeatureIndicesDo: [:start :end |
+ (self at: start) isLetter ifTrue: [
+ aBlock value: (self copyFrom: start to: end) asLowercase]].!

Item was changed:
  ----- Method: String>>findTokens: (in category 'accessing') -----
  findTokens: delimiters
+ "Answer the collection of tokens that result from parsing self."
+
+ ^ OrderedCollection streamContents: [:tokens |
+ self
+ findTokens: delimiters
+ do: [:token | tokens nextPut: token]]!
- "Answer the collection of tokens that result from parsing self.  Return strings between the delimiters.  Any character in the Collection delimiters marks a border.  Several delimiters in a row are considered as just one separation.  Also, allow delimiters to be a single character."
-
- | tokens keyStart keyStop separators |
-
- tokens := OrderedCollection new.
- separators := delimiters isCharacter
- ifTrue: [Array with: delimiters]
- ifFalse: [delimiters].
- keyStop := 1.
- [keyStop <= self size] whileTrue:
- [keyStart := self skipDelimiters: separators startingAt: keyStop.
- keyStop := self findDelimiters: separators startingAt: keyStart.
- keyStart < keyStop
- ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
- ^tokens!

Item was added:
+ ----- Method: String>>findTokens:do: (in category 'accessing') -----
+ findTokens: delimiters do: aBlock
+
+ self
+ findTokens: delimiters
+ indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].!

Item was added:
+ ----- Method: String>>findTokens:indicesDo: (in category 'accessing') -----
+ findTokens: delimiters indicesDo: aBlock
+ "Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border.  Several delimiters in a row are considered as just one separation.  Also, allow delimiters to be a single character. Similar to #lineIndicesDo:."
+
+ | tokens keyStart keyStop separators |
+ separators := delimiters isCharacter
+ ifTrue: [Array with: delimiters]
+ ifFalse: [delimiters].
+ keyStop := 1.
+ [keyStop <= self size] whileTrue: [
+ keyStart := self skipDelimiters: separators startingAt: keyStop.
+ keyStop := self findDelimiters: separators startingAt: keyStart.
+ keyStart < keyStop
+ ifTrue: [aBlock value: keyStart value: keyStop - 1]].!


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-mt.838.mcz

marcel.taeumel
Credits go to Toni Mattis (https://github.com/amintos) for the idea and implementation! :-)

Thanks!

Am 04.07.2019 16:32:51 schrieb [hidden email] <[hidden email]>:

A new version of Collections was added to project The Inbox:
http://source.squeak.org/inbox/Collections-mt.838.mcz

==================== Summary ====================

Name: Collections-mt.838
Author: mt
Time: 4 July 2019, 4:32:40.854026 pm
UUID: ac4ab442-79c0-d246-8dec-914be7ee5356
Ancestors: Collections-pre.837

To String, adds simple analysis of natural language in source code. No word stemming.

1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:).
2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:.

Try this:

HTTPDownloadRequest name findFeatures.
(Morph >> #drawOn:) getSource asString findFeatures.

Where can that be useful?

- Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc.
- Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc.

=============== Diff against Collections-pre.837 ===============

Item was added:
+ ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') -----
+ findFeatureIndicesDo: aBlock
+ "State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons"
+ | last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op" |
+
+ state := 0.
+ last := 1.
+
+ 1 to: self size do: [ :index |
+ char := self at: index.
+ "a"
+ char isLowercase ifTrue: [
+ (state < 3)="" iftrue:="" [state="" :="1]." "*a="" -=""> a"
+ (state == 3) ifTrue: [
+ "AAa -> A + Aa (camel case follows uppercase)"
+ aBlock value: last value: index - 2.
+ last := index - 1.
+ state := 2].
+ (state > 3) ifTrue: [
+ "+a -> + | a (letter follows non-letter)"
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 1]]
+ ifFalse: [
+ char isUppercase ifTrue: [
+ (state == 0)
+ ifTrue: [state := 2] "start -> A"
+ ifFalse: [
+ (state < 2="" or:="" [state=""> 3]) ifTrue: [
+ "*A -> * | A (uppercase begins, flush before)"
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 2] ifFalse: [
+ "AA -> AA (uppercase continues)"
+ state := 3]]]
+ ifFalse: [
+ ("char == $: or:" char isSeparator) ifTrue: [
+ "skip colon/whitespace"
+ (state > 0) ifTrue: [
+ aBlock value: last value: index - 1.
+ state := 0].
+ last := index + 1]
+ ifFalse: [
+ char isDigit ifTrue: [
+ (state == 0)
+ ifTrue: [state := 4]
+ ifFalse: [
+ (state ~= 4) ifTrue: [
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 4]]]
+ ifFalse: [
+ (state == 0)
+ ifTrue: [state := 5]
+ ifFalse: [
+ (state < 5)="" iftrue:="">
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 5]]]]]]].
+ last <= self="" size="" iftrue:="">
+ aBlock value: last value: self size]!

Item was added:
+ ----- Method: String>>findFeatures (in category 'accessing - features') -----
+ findFeatures
+
+ ^ Array streamContents: [:features |
+ self findFeaturesDo: [:feature | features nextPut: feature]]!

Item was added:
+ ----- Method: String>>findFeaturesDo: (in category 'accessing - features') -----
+ findFeaturesDo: aBlock
+ "Simple analysis for natural language in source code. No support for word stemming."
+
+ self findFeatureIndicesDo: [:start :end |
+ (self at: start) isLetter ifTrue: [
+ aBlock value: (self copyFrom: start to: end) asLowercase]].!

Item was changed:
----- Method: String>>findTokens: (in category 'accessing') -----
findTokens: delimiters
+ "Answer the collection of tokens that result from parsing self."
+
+ ^ OrderedCollection streamContents: [:tokens |
+ self
+ findTokens: delimiters
+ do: [:token | tokens nextPut: token]]!
- "Answer the collection of tokens that result from parsing self. Return strings between the delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character."
-
- | tokens keyStart keyStop separators |
-
- tokens := OrderedCollection new.
- separators := delimiters isCharacter
- ifTrue: [Array with: delimiters]
- ifFalse: [delimiters].
- keyStop := 1.
- [keyStop <= self="" size]="">
- [keyStart := self skipDelimiters: separators startingAt: keyStop.
- keyStop := self findDelimiters: separators startingAt: keyStart.
- keyStart <>
- ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
- ^tokens!

Item was added:
+ ----- Method: String>>findTokens:do: (in category 'accessing') -----
+ findTokens: delimiters do: aBlock
+
+ self
+ findTokens: delimiters
+ indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].!

Item was added:
+ ----- Method: String>>findTokens:indicesDo: (in category 'accessing') -----
+ findTokens: delimiters indicesDo: aBlock
+ "Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character. Similar to #lineIndicesDo:."
+
+ | tokens keyStart keyStop separators |
+ separators := delimiters isCharacter
+ ifTrue: [Array with: delimiters]
+ ifFalse: [delimiters].
+ keyStop := 1.
+ [keyStop <= self="" size]="" whiletrue:="">
+ keyStart := self skipDelimiters: separators startingAt: keyStop.
+ keyStop := self findDelimiters: separators startingAt: keyStart.
+ keyStart <>
+ ifTrue: [aBlock value: keyStart value: keyStop - 1]].!




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-mt.838.mcz

Levente Uzonyi
In reply to this post by commits-2
On Thu, 4 Jul 2019, [hidden email] wrote:

> A new version of Collections was added to project The Inbox:
> http://source.squeak.org/inbox/Collections-mt.838.mcz
>
> ==================== Summary ====================
>
> Name: Collections-mt.838
> Author: mt
> Time: 4 July 2019, 4:32:40.854026 pm
> UUID: ac4ab442-79c0-d246-8dec-914be7ee5356
> Ancestors: Collections-pre.837
>
> To String, adds simple analysis of natural language in source code. No word stemming.
>
> 1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:).
> 2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:.
>
> Try this:
>
> HTTPDownloadRequest name findFeatures.
> (Morph >> #drawOn:) getSource asString findFeatures.
>
> Where can that be useful?
>
> - Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc.
> - Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc.

Given the new methods' completixy, I think they deserve tests.

>
> =============== Diff against Collections-pre.837 ===============
>
> Item was added:
> + ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') -----
> + findFeatureIndicesDo: aBlock
> + "State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons"

I think an example would help make it is easier to understand what this
method does. (The same applies to #findTokens:, but I'm already familiar
with that.)

> + | last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op"  |
> +
> + state := 0.
> + last := 1.
> +
> + 1 to: self size do: [ :index |
> + char := self at: index.
> + "a"
> + char isLowercase ifTrue: [
> + (state < 3) ifTrue: [state := 1]. "*a -> a"
> + (state == 3) ifTrue: [

#= is optimized just as good as #== when the argument is a constant. Using
#= and dropping the unnecessary parentheses would make the code look a bit
less "C-style".

> + "AAa -> A + Aa (camel case follows uppercase)"
> + aBlock value: last value: index - 2.
> + last := index - 1.
> + state := 2].
> + (state > 3) ifTrue: [
> + "+a -> + | a (letter follows non-letter)"
> + aBlock value: last value: index - 1.
> + last := index.
> + state := 1]]
> + ifFalse: [
> + char isUppercase ifTrue: [
> + (state == 0)
> + ifTrue: [state := 2] "start -> A"
> + ifFalse: [
> + (state < 2 or: [state > 3]) ifTrue: [
> + "*A -> * | A (uppercase begins, flush before)"
> + aBlock value: last value: index - 1.
> + last := index.
> + state := 2] ifFalse: [
> + "AA -> AA (uppercase continues)"
> + state := 3]]]
> + ifFalse: [
> + ("char == $: or:" char isSeparator) ifTrue: [
> + "skip colon/whitespace"
> + (state > 0) ifTrue: [
> + aBlock value: last value: index - 1.
> + state := 0].
> + last := index + 1]
> + ifFalse: [
> + char isDigit ifTrue: [
> + (state == 0)
> + ifTrue: [state := 4]
> + ifFalse: [
> + (state ~= 4) ifTrue: [
> + aBlock value: last value: index - 1.
> + last := index.
> + state := 4]]]
> + ifFalse: [
> + (state == 0)
> + ifTrue: [state := 5]
> + ifFalse: [
> + (state < 5) ifTrue: [
> + aBlock value: last value: index - 1.
> + last := index.
> + state := 5]]]]]]].
> + last <= self size ifTrue: [
> + aBlock value: last value: self size]!
>
> Item was added:
> + ----- Method: String>>findFeatures (in category 'accessing - features') -----
> + findFeatures
> +
> + ^ Array streamContents: [:features |
> + self findFeaturesDo: [:feature | features nextPut: feature]]!
>
> Item was added:
> + ----- Method: String>>findFeaturesDo: (in category 'accessing - features') -----
> + findFeaturesDo: aBlock
> + "Simple analysis for natural language in source code. No support for word stemming."
> +
> + self findFeatureIndicesDo: [:start :end |
> + (self at: start) isLetter ifTrue: [
> + aBlock value: (self copyFrom: start to: end) asLowercase]].!
>
> Item was changed:
>  ----- Method: String>>findTokens: (in category 'accessing') -----
>  findTokens: delimiters
> + "Answer the collection of tokens that result from parsing self."
> +
> + ^ OrderedCollection streamContents: [:tokens |

#streamContents: should never be used with OrderedCollection.
OrderedCollection has its own streaming API (I would use #addLast: here)
which is way more efficient.

> + self
> + findTokens: delimiters
> + do: [:token | tokens nextPut: token]]!
> - "Answer the collection of tokens that result from parsing self.  Return strings between the delimiters.  Any character in the Collection delimiters marks a border.  Several delimiters in a row are considered as just one separation.  Also, allow delimiters to be a single character."
> -
> - | tokens keyStart keyStop separators |
> -
> - tokens := OrderedCollection new.
> - separators := delimiters isCharacter
> - ifTrue: [Array with: delimiters]
> - ifFalse: [delimiters].
> - keyStop := 1.
> - [keyStop <= self size] whileTrue:
> - [keyStart := self skipDelimiters: separators startingAt: keyStop.
> - keyStop := self findDelimiters: separators startingAt: keyStart.
> - keyStart < keyStop
> - ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
> - ^tokens!
>
> Item was added:
> + ----- Method: String>>findTokens:do: (in category 'accessing') -----
> + findTokens: delimiters do: aBlock
> +
> + self
> + findTokens: delimiters
> + indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].!
>
> Item was added:
> + ----- Method: String>>findTokens:indicesDo: (in category 'accessing') -----
> + findTokens: delimiters indicesDo: aBlock
> + "Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border.  Several delimiters in a row are considered as just one separation.  Also, allow delimiters to be a single character. Similar to #lineIndicesDo:."
> +
> + | tokens keyStart keyStop separators |

There are a few opportunities to regain the performance lost with the
introduction of blocks and sends:
- the tokens temporary is unused
- self size should be cached in a temporary (size)
- instead of Array >> #with:, a brace array should be used
- | keyStop keyStart separators size | would probably yield the best
performance

Levente

> + separators := delimiters isCharacter
> + ifTrue: [Array with: delimiters]
> + ifFalse: [delimiters].
> + keyStop := 1.
> + [keyStop <= self size] whileTrue: [
> + keyStart := self skipDelimiters: separators startingAt: keyStop.
> + keyStop := self findDelimiters: separators startingAt: keyStart.
> + keyStart < keyStop
> + ifTrue: [aBlock value: keyStart value: keyStop - 1]].!

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-mt.838.mcz

marcel.taeumel
Hi Levente,

thanks for the tips! I would appreciate another review of Collections-mt.839 and CollectionsTests-mt.313. :-)


Best,
Marcel

Am 05.07.2019 00:44:39 schrieb Levente Uzonyi <[hidden email]>:

On Thu, 4 Jul 2019, [hidden email] wrote:

> A new version of Collections was added to project The Inbox:
> http://source.squeak.org/inbox/Collections-mt.838.mcz
>
> ==================== Summary ====================
>
> Name: Collections-mt.838
> Author: mt
> Time: 4 July 2019, 4:32:40.854026 pm
> UUID: ac4ab442-79c0-d246-8dec-914be7ee5356
> Ancestors: Collections-pre.837
>
> To String, adds simple analysis of natural language in source code. No word stemming.
>
> 1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:).
> 2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:.
>
> Try this:
>
> HTTPDownloadRequest name findFeatures.
> (Morph >> #drawOn:) getSource asString findFeatures.
>
> Where can that be useful?
>
> - Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc.
> - Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc.

Given the new methods' completixy, I think they deserve tests.

>
> =============== Diff against Collections-pre.837 ===============
>
> Item was added:
> + ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') -----
> + findFeatureIndicesDo: aBlock
> + "State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons"

I think an example would help make it is easier to understand what this
method does. (The same applies to #findTokens:, but I'm already familiar
with that.)

> + | last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op" |
> +
> + state := 0.
> + last := 1.
> +
> + 1 to: self size do: [ :index |
> + char := self at: index.
> + "a"
> + char isLowercase ifTrue: [
> + (state < 3)="" iftrue:="" [state="" :="1]." "*a="" -=""> a"
> + (state == 3) ifTrue: [

#= is optimized just as good as #== when the argument is a constant. Using
#= and dropping the unnecessary parentheses would make the code look a bit
less "C-style".

> + "AAa -> A + Aa (camel case follows uppercase)"
> + aBlock value: last value: index - 2.
> + last := index - 1.
> + state := 2].
> + (state > 3) ifTrue: [
> + "+a -> + | a (letter follows non-letter)"
> + aBlock value: last value: index - 1.
> + last := index.
> + state := 1]]
> + ifFalse: [
> + char isUppercase ifTrue: [
> + (state == 0)
> + ifTrue: [state := 2] "start -> A"
> + ifFalse: [
> + (state < 2="" or:="" [state=""> 3]) ifTrue: [
> + "*A -> * | A (uppercase begins, flush before)"
> + aBlock value: last value: index - 1.
> + last := index.
> + state := 2] ifFalse: [
> + "AA -> AA (uppercase continues)"
> + state := 3]]]
> + ifFalse: [
> + ("char == $: or:" char isSeparator) ifTrue: [
> + "skip colon/whitespace"
> + (state > 0) ifTrue: [
> + aBlock value: last value: index - 1.
> + state := 0].
> + last := index + 1]
> + ifFalse: [
> + char isDigit ifTrue: [
> + (state == 0)
> + ifTrue: [state := 4]
> + ifFalse: [
> + (state ~= 4) ifTrue: [
> + aBlock value: last value: index - 1.
> + last := index.
> + state := 4]]]
> + ifFalse: [
> + (state == 0)
> + ifTrue: [state := 5]
> + ifFalse: [
> + (state < 5)="" iftrue:="">
> + aBlock value: last value: index - 1.
> + last := index.
> + state := 5]]]]]]].
> + last <= self="" size="" iftrue:="">
> + aBlock value: last value: self size]!
>
> Item was added:
> + ----- Method: String>>findFeatures (in category 'accessing - features') -----
> + findFeatures
> +
> + ^ Array streamContents: [:features |
> + self findFeaturesDo: [:feature | features nextPut: feature]]!
>
> Item was added:
> + ----- Method: String>>findFeaturesDo: (in category 'accessing - features') -----
> + findFeaturesDo: aBlock
> + "Simple analysis for natural language in source code. No support for word stemming."
> +
> + self findFeatureIndicesDo: [:start :end |
> + (self at: start) isLetter ifTrue: [
> + aBlock value: (self copyFrom: start to: end) asLowercase]].!
>
> Item was changed:
> ----- Method: String>>findTokens: (in category 'accessing') -----
> findTokens: delimiters
> + "Answer the collection of tokens that result from parsing self."
> +
> + ^ OrderedCollection streamContents: [:tokens |

#streamContents: should never be used with OrderedCollection.
OrderedCollection has its own streaming API (I would use #addLast: here)
which is way more efficient.

> + self
> + findTokens: delimiters
> + do: [:token | tokens nextPut: token]]!
> - "Answer the collection of tokens that result from parsing self. Return strings between the delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character."
> -
> - | tokens keyStart keyStop separators |
> -
> - tokens := OrderedCollection new.
> - separators := delimiters isCharacter
> - ifTrue: [Array with: delimiters]
> - ifFalse: [delimiters].
> - keyStop := 1.
> - [keyStop <= self="" size]="">
> - [keyStart := self skipDelimiters: separators startingAt: keyStop.
> - keyStop := self findDelimiters: separators startingAt: keyStart.
> - keyStart <>
> - ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
> - ^tokens!
>
> Item was added:
> + ----- Method: String>>findTokens:do: (in category 'accessing') -----
> + findTokens: delimiters do: aBlock
> +
> + self
> + findTokens: delimiters
> + indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].!
>
> Item was added:
> + ----- Method: String>>findTokens:indicesDo: (in category 'accessing') -----
> + findTokens: delimiters indicesDo: aBlock
> + "Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character. Similar to #lineIndicesDo:."
> +
> + | tokens keyStart keyStop separators |

There are a few opportunities to regain the performance lost with the
introduction of blocks and sends:
- the tokens temporary is unused
- self size should be cached in a temporary (size)
- instead of Array >> #with:, a brace array should be used
- | keyStop keyStart separators size | would probably yield the best
performance

Levente

> + separators := delimiters isCharacter
> + ifTrue: [Array with: delimiters]
> + ifFalse: [delimiters].
> + keyStop := 1.
> + [keyStop <= self="" size]="" whiletrue:="">
> + keyStart := self skipDelimiters: separators startingAt: keyStop.
> + keyStop := self findDelimiters: separators startingAt: keyStart.
> + keyStart <>
> + ifTrue: [aBlock value: keyStart value: keyStop - 1]].!