The Inbox: Collections-nice.869.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Collections-nice.869.mcz

commits-2
Nicolas Cellier uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-nice.869.mcz

==================== Summary ====================

Name: Collections-nice.869
Author: nice
Time: 31 December 2019, 11:14:43.388306 pm
UUID: e5a9bef5-16d3-4d8a-bf08-040d30e4e907
Ancestors: Collections-nice.868

Remaster Collections-nice.464: opimize RunArray

This should not be noticeable for Text, but as a general library, it's important for any other potential use.

Among those optimizations, notice one important change: some of the enumerating methods like #collect: won't iterate on each element, but only once per run. Be aware that providing blocks with side effects like (i := i + 1) might lead to different behavior than ordinary ArrayedCollection.

Move RunArray off ArrayedCollection which serves nothing to such subclass.

Add the ability to remove: since it already has the hability to addFirst: and addLast:

Fix a few missing lastIndex cache flush, and advertise about the necessity to do it in class comment.

Deprecate mapValues: to the profit of replace:.

=============== Diff against Collections-nice.868 ===============

Item was changed:
+ SequenceableCollection subclass: #RunArray
- ArrayedCollection subclass: #RunArray
  instanceVariableNames: 'runs values lastIndex lastRun lastOffset'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Arrayed'!
 
+ !RunArray commentStamp: 'nice 12/30/2019 00:57' prior: 0!
- !RunArray commentStamp: '<historical>' prior: 0!
  My instances provide space-efficient storage of data which tends to be constant over long runs of the possible indices. Essentially repeated values are stored singly and then associated with a "run" length that denotes the number of consecutive occurrences of the value.
 
  My two important variables are
  runs An array of how many elements are in each run
  values An array of what the value is over those elements
 
  The variables lastIndex, lastRun and lastOffset cache the last access
  so that streaming through RunArrays is not an N-squared process.
+ Beware: methods modifying the RunArray contents should reset the lastIndex cache to nil.
 
  Many complexities of access can be bypassed by using the method
  RunArray withStartStopAndValueDo:!

Item was changed:
  ----- Method: RunArray class>>newFrom: (in category 'instance creation') -----
  newFrom: aCollection
  "Answer an instance of me containing the same elements as aCollection."
 
  | newCollection |
  newCollection := self new.
+ newCollection fillFrom: aCollection with: [:each | each].
- aCollection do: [:x | newCollection addLast: x].
  ^newCollection
 
  " RunArray newFrom: {1. 2. 2. 3}
  {1. $a. $a. 3} as: RunArray
  ({1. $a. $a. 3} as: RunArray) values
  "!

Item was added:
+ ----- Method: RunArray>>allSatisfy: (in category 'enumerating') -----
+ allSatisfy: aBlock
+ "Only evaluate once per run"
+
+ ^values allSatisfy: aBlock!

Item was added:
+ ----- Method: RunArray>>anySatisfy: (in category 'enumerating') -----
+ anySatisfy: aBlock
+ "Only evaluate once per run"
+
+ ^values anySatisfy: aBlock!

Item was added:
+ ----- Method: RunArray>>asBag (in category 'converting') -----
+ asBag
+ | aBag |
+ aBag := Bag new: values size.
+ self runsAndValuesDo: [:run :value |
+ aBag add: value withOccurrences: run].
+ ^aBag!

Item was added:
+ ----- Method: RunArray>>asSet (in category 'converting') -----
+ asSet
+ ^values asSet!

Item was changed:
+ ----- Method: RunArray>>coalesce (in category 'private') -----
- ----- Method: RunArray>>coalesce (in category 'adding') -----
  coalesce
+ "Coalesce theRuns and theValues if ever the values have adjacent equal objects"
+
+ | lastLength lastValue mustCoalesce coalescedRuns coalescedValued runIndex |
+ mustCoalesce := false.
+ runIndex := 0.
+ lastLength := 0.
+ lastValue := Object new.
+ runs with: values do: [:run :value |
+ (lastValue = value or: [run = 0])
+ ifTrue:
+ [mustCoalesce
+ ifFalse:
+ [coalescedRuns := (Array new: runs size) writeStream.
+ coalescedValued := (Array new: values size) writeStream.
+ coalescedRuns next: runIndex putAll: runs startingAt: 1.
+ coalescedValued next: runIndex putAll: values startingAt: 1.
+ mustCoalesce := true].
+ lastLength := lastLength + run]
+ ifFalse:
+ [lastLength > 0
+ ifTrue:
+ [mustCoalesce
+ ifTrue:
+ [coalescedRuns nextPut: lastLength.
+ coalescedValued nextPut: lastValue].
+ runIndex := runIndex + 1].
+ lastLength := run.
+ lastValue := value]].
+ mustCoalesce
+ ifTrue:
+ [lastLength > 0
+ ifTrue:
+ [coalescedRuns nextPut: lastLength.
+ coalescedValued nextPut: lastValue].
+ self setRuns: coalescedRuns contents setValues: coalescedValued contents]!
- "Try to combine adjacent runs"
- | ind |
- ind := 2.
- [ind > values size] whileFalse: [
- (values at: ind-1) = (values at: ind)
- ifFalse: [ind := ind + 1]
- ifTrue: ["two are the same, combine them"
- values := values copyReplaceFrom: ind to: ind with: #().
- runs at: ind-1 put: (runs at: ind-1) + (runs at: ind).
- runs := runs copyReplaceFrom: ind to: ind with: #().
- "self error: 'needed to combine runs' "]].
- !

Item was added:
+ ----- Method: RunArray>>collect: (in category 'enumerating') -----
+ collect: aBlock
+ "Beware, the block will be evaluated only once per group of values."
+ ^(self class runs: runs copy values: (values collect: aBlock)) coalesce!

Item was added:
+ ----- Method: RunArray>>copyUpThrough: (in category 'copying') -----
+ copyUpThrough: value
+ "Optimized"
+
+ | newSize newValues newRuns |
+ newSize := values indexOf: value startingAt: 1.
+ newSize = 0 ifTrue: [^self copy].
+ newRuns := runs copyFrom: 1 to: newSize.
+ newRuns at: newSize put: 1.
+ newValues := values copyFrom: 1 to: newSize.
+ ^ self class
+ runs: newRuns
+ values: newValues!

Item was added:
+ ----- Method: RunArray>>copyUpTo: (in category 'copying') -----
+ copyUpTo: anElement
+ "Optimized"
+
+ | newValues |
+ newValues := values copyUpTo: anElement.
+ ^ self class
+ runs: (runs copyFrom: 1 to: newValues size)
+ values: newValues!

Item was added:
+ ----- Method: RunArray>>copyUpToLast: (in category 'copying') -----
+ copyUpToLast: value
+ "Optimized"
+
+ | newSize run newRuns newValues |
+ newSize := values lastIndexOf: value startingAt: values size.
+ newSize = 0 ifTrue: [^self copy].
+ run := runs at: newSize.
+ run > 1
+ ifTrue:
+ [newRuns := runs copyFrom: 1 to: newSize.
+ newRuns at: newSize put: run - 1]
+ ifFalse:
+ [newSize := newSize - 1.
+ newRuns := runs copyFrom: 1 to: newSize].
+ newValues := values copyFrom: 1 to: newSize.
+ ^ self class
+ runs: newRuns
+ values: newValues!

Item was added:
+ ----- Method: RunArray>>count: (in category 'enumerating') -----
+ count: aBlock
+ "Beware, the block will be evaluated only once per group of values."
+ | count |
+ count := 0.
+ self runsAndValuesDo: [:run :value |
+ (aBlock value: value)
+ ifTrue:
+ [count := count + run]].
+ ^count!

Item was added:
+ ----- Method: RunArray>>detect:ifNone: (in category 'enumerating') -----
+ detect: aBlock ifNone: exceptionBlock
+ "Only evaluate once per run"
+
+ ^values detect: aBlock ifNone: exceptionBlock !

Item was added:
+ ----- Method: RunArray>>detectMax: (in category 'enumerating') -----
+ detectMax: aBlock
+ "Only evaluate once per run"
+
+ ^values detectMax: aBlock!

Item was added:
+ ----- Method: RunArray>>detectMin: (in category 'enumerating') -----
+ detectMin: aBlock
+ "Only evaluate once per run"
+
+ ^values detectMin: aBlock!

Item was added:
+ ----- Method: RunArray>>detectSum: (in category 'enumerating') -----
+ detectSum: aBlock
+ "Only loop on runs"
+ | sum |
+ sum := 0.
+ self runsAndValuesDo: [:run :value |
+ sum := (aBlock value: value) * run + sum].  
+ ^ sum!

Item was added:
+ ----- Method: RunArray>>do: (in category 'enumerating') -----
+ do: aBlock
+ "This is refined for speed"
+
+ 1 to: runs size do: [:i |
+ | r v |
+ v := values at: i.
+ r := runs at: i.
+ [( r := r - 1) >= 0]
+ whileTrue: [aBlock value: v]].!

Item was added:
+ ----- Method: RunArray>>fillFrom:with: (in category 'private') -----
+ fillFrom: aCollection with: aBlock
+ "Evaluate aBlock with each of aCollections's elements as the argument.  
+ Collect the resulting values into self. Answer self."
+
+ | newRuns newValues lastLength lastValue |
+ newRuns := (Array new: aCollection size) writeStream.
+ newValues := (Array new: aCollection size) writeStream.
+ lastLength := 0.
+ lastValue := Object new.
+ lastIndex := nil.  "flush access cache"
+ aCollection do: [:each |
+ | value |
+ value := aBlock value: each.
+ lastValue = value
+ ifTrue: [lastLength := lastLength + 1]
+ ifFalse:
+ [lastLength > 0
+ ifTrue:
+ [newRuns nextPut: lastLength.
+ newValues nextPut: lastValue].
+ lastLength := 1.
+ lastValue := value]].
+ lastLength > 0
+ ifTrue:
+ [newRuns nextPut: lastLength.
+ newValues nextPut: lastValue].
+ self setRuns: newRuns contents setValues: newValues contents!

Item was added:
+ ----- Method: RunArray>>findFirst: (in category 'enumerating') -----
+ findFirst: aBlock
+ | index |
+ index := 1.
+ self runsAndValuesDo: [ :run :value |
+ (aBlock value: value) ifTrue: [^index].
+ index := index + run].
+ ^0!

Item was added:
+ ----- Method: RunArray>>findLast: (in category 'enumerating') -----
+ findLast: aBlock
+ | index |
+ index := values size + 1.
+ [(index := index - 1) >= 1] whileTrue:
+ [(aBlock value: (values at: index)) ifTrue: [^(1 to: index) detectSum: [:i | runs at: i]]].
+ ^0!

Item was added:
+ ----- Method: RunArray>>includes: (in category 'testing') -----
+ includes: anObject
+ "Answer whether anObject is one of the receiver's elements."
+
+ ^values includes: anObject!

Item was added:
+ ----- Method: RunArray>>indexOf:startingAt: (in category 'accessing') -----
+ indexOf: anElement startingAt: start
+ "Answer the index of the first occurence of anElement after start
+ within the receiver. If the receiver does not contain anElement,
+ answer 0."
+
+ | index |
+ index := 1.
+ self runsAndValuesDo: [ :run :value |
+ (index >= start and: [value = anElement]) ifTrue: [^index].
+ index := index + run].
+ ^0!

Item was added:
+ ----- Method: RunArray>>indexOfAnyOf:startingAt: (in category 'accessing') -----
+ indexOfAnyOf: aCollection startingAt: start
+ "Answer the index of the first occurence of any element included in aCollection after start within the receiver.
+ If the receiver does not contain anElement, answer zero, which is an invalid index."
+
+ | index |
+ index := 1.
+ self runsAndValuesDo: [ :run :value |
+ (index >= start and: [aCollection includes: value]) ifTrue: [^index].
+ index := index + run].
+ ^0!

Item was added:
+ ----- Method: RunArray>>isSorted (in category 'testing') -----
+ isSorted
+ ^values isSorted!

Item was added:
+ ----- Method: RunArray>>isSortedBy: (in category 'testing') -----
+ isSortedBy: aBlock
+ ^values isSortedBy: aBlock!

Item was added:
+ ----- Method: RunArray>>lastIndexOf:startingAt: (in category 'accessing') -----
+ lastIndexOf: anElement startingAt: lastIndex
+ "Answer the index of the last occurence of anElement within the  
+ receiver. If the receiver does not contain anElement, answer 0."
+
+ | lastValueIndex |
+ lastValueIndex := values lastIndexOf: anElement startingAt: values size.
+ [lastValueIndex > 0] whileTrue:
+ [| i index |
+ i := index := 0.
+ [index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
+ whileTrue: [index := index + (runs at: i)].
+ index <= lastIndex ifTrue: [^index].
+ index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
+ lastValueIndex := values lastIndexOf: anElement startingAt: lastValueIndex - 1].
+ ^0!

Item was added:
+ ----- Method: RunArray>>lastIndexOfAnyOf:startingAt: (in category 'accessing') -----
+ lastIndexOfAnyOf: aCollection startingAt: lastIndex
+ "Answer the index of the last occurence of any element of aCollection
+ within the receiver. If the receiver does not contain any of those
+ elements, answer 0"
+
+ | lastValueIndex |
+ lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt: values size.
+ [lastValueIndex > 0] whileTrue:
+ [| i index |
+ i := index := 0.
+ [index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
+ whileTrue: [index := index + (runs at: i)].
+ index <= lastIndex ifTrue: [^index].
+ index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
+ lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt: lastValueIndex - 1].
+ ^0!

Item was removed:
- ----- Method: RunArray>>mapValues: (in category 'private') -----
- mapValues: mapBlock
- "NOTE: only meaningful to an entire set of runs"
-
- values := values collect: mapBlock!

Item was added:
+ ----- Method: RunArray>>noneSatisfy: (in category 'enumerating') -----
+ noneSatisfy: aBlock
+ "Only evaluate once per run"
+
+ ^values noneSatisfy: aBlock!

Item was changed:
+ ----- Method: RunArray>>rangeOf:startingAt: (in category 'accessing') -----
- ----- Method: RunArray>>rangeOf:startingAt: (in category 'adding') -----
  rangeOf: attr startingAt: startPos
  "Answer an interval that gives the range of attr at index position  startPos. An empty interval with start value startPos is returned when the attribute attr is not present at position startPos.  self size > 0 is assumed, it is the responsibility of the caller to test for emptiness of self.
  Note that an attribute may span several adjancent runs. "
 
  self at: startPos
  setRunOffsetAndValue:
              [:run :offset :value |
                 ^(value includes: attr)
                    ifFalse: [startPos to: startPos - 1]
                    ifTrue:
                      [ | firstRelevantPosition lastRelevantPosition idxOfCandidateRun |
                       lastRelevantPosition := startPos - offset + (runs at: run) - 1.
                       firstRelevantPosition := startPos - offset.
                       idxOfCandidateRun := run + 1.
                       [idxOfCandidateRun <= runs size
                               and: [(values at: idxOfCandidateRun) includes: attr]]
                          whileTrue:
                            [lastRelevantPosition := lastRelevantPosition + (runs at: idxOfCandidateRun).
                             idxOfCandidateRun := idxOfCandidateRun + 1].
                       idxOfCandidateRun := run - 1.
                       [idxOfCandidateRun >= 1
                               and: [(values at: idxOfCandidateRun) includes: attr]]
                          whileTrue:
                            [firstRelevantPosition := firstRelevantPosition - (runs at: idxOfCandidateRun).
                             idxOfCandidateRun := idxOfCandidateRun - 1].
   
                      firstRelevantPosition to: lastRelevantPosition]
   ]!

Item was added:
+ ----- Method: RunArray>>remove:ifAbsent: (in category 'removing') -----
+ remove: anObject ifAbsent: exceptionBlock
+ | index mustCoalesce run |
+ lastIndex := nil.  "flush access cache"
+ index := values indexOf: anObject ifAbsent: [^exceptionBlock value].
+ (run := runs at: index) > 1
+ ifTrue: [runs at: index put: run - 1]
+ ifFalse:
+ [mustCoalesce := index > 1 and: [index < values size and: [(values at: index - 1) = (values at: index + 1)]].
+ runs := runs copyWithoutIndex: index.
+ values := values copyWithoutIndex: index.
+ mustCoalesce
+ ifTrue:
+ [runs at: index - 1 put: (runs at: index - 1) + (runs at: index).
+ runs := runs copyWithoutIndex: index.
+ values := values copyWithoutIndex: index]].
+ ^anObject!

Item was added:
+ ----- Method: RunArray>>removeAll (in category 'removing') -----
+ removeAll
+ lastIndex := nil.  "flush access cache"
+ runs := Array new.
+ values := Array new!

Item was added:
+ ----- Method: RunArray>>replace: (in category 'enumerating') -----
+ replace: aBlock
+ "destructively replace the values in this RunArray with the ones transformed by aBlock."
+ lastIndex := nil.  "flush access cache"
+ values := values replace: aBlock.
+ self coalesce!

Item was added:
+ ----- Method: RunArray>>reverseDo: (in category 'enumerating') -----
+ reverseDo: aBlock
+ "This is refined for speed"
+
+ | i |
+ i := runs size.
+ [i > 0]
+ whileTrue:
+ [ | r v |
+ v := values at: i.
+ r := runs at: i.
+ i := i - 1.
+ [( r := r - 1) >= 0]
+ whileTrue: [aBlock value: v]].!

Item was added:
+ ----- Method: RunArray>>select: (in category 'enumerating') -----
+ select: aBlock
+ "Beware, the block will be evaluated only once per group of values."
+ | newRuns newValues |
+ newRuns := (Array new: runs size) writeStream.
+ newValues := (Array new: values size) writeStream.
+ self runsAndValuesDo: [:run :value |
+ (aBlock value: value)
+ ifTrue:
+ [newRuns nextPut: run.
+ newValues nextPut: value]].
+ ^(self class runs: newRuns contents values: newValues contents) coalesce!

Item was changed:
  ----- Method: SequenceableCollection>>lastIndexOf:startingAt: (in category 'accessing') -----
  lastIndexOf: anElement startingAt: lastIndex
  "Answer the index of the last occurence of anElement within the  
+ receiver. If the receiver does not contain anElement, answer 0."
- receiver. If the receiver does not contain anElement, answer the
- result of evaluating the argument, exceptionBlock."
 
  lastIndex to: 1 by: -1 do: [ :index |
  (self at: index) = anElement ifTrue: [ ^index ] ].
  ^0!

Item was changed:
  ----- Method: Text>>addAttribute:from:to: (in category 'emphasis') -----
  addAttribute: att from: start to: stop
  "Set the attribute for characters in the interval start to stop."
  runs :=  runs copyReplaceFrom: start to: stop
  with: ((runs copyFrom: start to: stop)
+ replace:
- mapValues:
  [:attributes | Text addAttribute: att toArray: attributes])
  !

Item was changed:
  ----- Method: Text>>removeAttribute:from:to: (in category 'emphasis') -----
  removeAttribute: att from: start to: stop
  "Remove the attribute over the interval start to stop."
  runs :=  runs copyReplaceFrom: start to: stop
  with: ((runs copyFrom: start to: stop)
+ replace:
- mapValues:
  [:attributes | attributes copyWithout: att])
  !


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Christoph Thiede
Great, thanks for the enhancement!

Now we have got:


vs.


Would be great to align this. Is it necessary for RunArray >> do: to call
aBlock once per index?

Best,
Christoph



--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Christoph Thiede

Oops, Nabble rejected all my <raw>code</code> examples, sorry.


Was:

i := 0.
((RunArray new: 4 withAll: 42) collect: [:x | i := i + 1]) asArray. "#(1 1 1 1)""
vs.
i := 0.
(RunArray new: 4 withAll: 42) collect: [:x | i := i + 1] as: Array. "#(1 2 3 4)"


Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Thiede, Christoph
Gesendet: Mittwoch, 1. Januar 2020 19:52 Uhr
An: [hidden email]
Betreff: Re: [squeak-dev] The Inbox: Collections-nice.869.mcz
 
Great, thanks for the enhancement!

Now we have got:


vs.


Would be great to align this. Is it necessary for RunArray >> do: to call
aBlock once per index?

Best,
Christoph



--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Jakob Reschke
Hi Nicolas,

Am Mi., 1. Jan. 2020 um 19:48 Uhr schrieb Thiede, Christoph <[hidden email]>:

i := 0.
((RunArray new: 4 withAll: 42) collect: [:x | i := i + 1]) asArray. "#(1 1 1 1)""
This one is broken, isn't it? A RunArray is still a collection of elements and the comment of collect: in Collection states:

Evaluate aBlock with each of the receiver's elements as the argument.  
Collect the resulting values into a collection like the receiver. Answer  
the new collection. [emphasis mine]

IMO the storage optimization of RunArray should not affect the meaning of its operations. The intermediary RunArray in the code above should also contain 1, 2, 3, 4, even though it defeats the optimization.

Introduce a new selector for the optimized collect operation?

Kind regards,
Jakob


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Nicolas Cellier
In reply to this post by Christoph Thiede
Hi Christoph,


Le mer. 1 janv. 2020 à 19:48, Thiede, Christoph <[hidden email]> a écrit :

Oops, Nabble rejected all my <raw>code</code> examples, sorry.


Was:

i := 0.
((RunArray new: 4 withAll: 42) collect: [:x | i := i + 1]) asArray. "#(1 1 1 1)""
vs.
i := 0.
(RunArray new: 4 withAll: 42) collect: [:x | i := i + 1] as: Array. "#(1 2 3 4)"


Best,
Christoph

Yes exactly, #collect: select: reject: replace: now have different behavior if we adopt those changes.
If unwanted, do not use RunArray.

Bag #collect: select: reject: could also behave like that.

Having side effects in do: blocks is OK.
Having side effects in collect: slect: reject: blocks is more questionable.
We use such side effects because we know that these methods are implemented with #do:, but it's not necessarily so.

I thought about #collect:as: and wondered if it should preserve iteration on each element or also act per run.
Since the intention is to transfer to another class, I think that element-wise loop is the least surprising.



Von: Squeak-dev <[hidden email]> im Auftrag von Thiede, Christoph
Gesendet: Mittwoch, 1. Januar 2020 19:52 Uhr
An: [hidden email]
Betreff: Re: [squeak-dev] The Inbox: Collections-nice.869.mcz
 
Great, thanks for the enhancement!

Now we have got:


vs.


Would be great to align this. Is it necessary for RunArray >> do: to call
aBlock once per index?

I don't think so.
It's expected to have block with side effects for do:
We have (aRunArray values do:) or (aRunArray runsAndValuesDo:)


Best,
Christoph



--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Christoph Thiede

Hi all,


If unwanted, do not use RunArray.


In general, to me, this seems not to be the best argument :)
I regard RunArray as a collection with optimized storage properties. I would definitely expect to work it analogously to a normal collection.
Just imagine something like SequenceableCollection >> #compressed which would convert the receiver to the subclass best matching the data set (even if it does not exist at the moment). In my opinion, RunArray should be an implementation detail.

Having side effects in collect: slect: reject: blocks is more questionable.

Yes, it is questionabe, but with respect to the documentation comment, I tend to agree with Jakob ...

Introduce a new selector for the optimized collect operation?

I also thought about this. Though, this would blow up the collection protocol even more ...

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Mittwoch, 1. Januar 2020 20:19:42
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] The Inbox: Collections-nice.869.mcz
 
Hi Christoph,


Le mer. 1 janv. 2020 à 19:48, Thiede, Christoph <[hidden email]> a écrit :

Oops, Nabble rejected all my <raw>code</code> examples, sorry.


Was:

i := 0.
((RunArray new: 4 withAll: 42) collect: [:x | i := i + 1]) asArray. "#(1 1 1 1)""
vs.
i := 0.
(RunArray new: 4 withAll: 42) collect: [:x | i := i + 1] as: Array. "#(1 2 3 4)"


Best,
Christoph

Yes exactly, #collect: select: reject: replace: now have different behavior if we adopt those changes.
If unwanted, do not use RunArray.

Bag #collect: select: reject: could also behave like that.

Having side effects in do: blocks is OK.
Having side effects in collect: slect: reject: blocks is more questionable.
We use such side effects because we know that these methods are implemented with #do:, but it's not necessarily so.

I thought about #collect:as: and wondered if it should preserve iteration on each element or also act per run.
Since the intention is to transfer to another class, I think that element-wise loop is the least surprising.



Von: Squeak-dev <[hidden email]> im Auftrag von Thiede, Christoph
Gesendet: Mittwoch, 1. Januar 2020 19:52 Uhr
An: [hidden email]
Betreff: Re: [squeak-dev] The Inbox: Collections-nice.869.mcz
 
Great, thanks for the enhancement!

Now we have got:


vs.


Would be great to align this. Is it necessary for RunArray >> do: to call
aBlock once per index?

I don't think so.
It's expected to have block with side effects for do:
We have (aRunArray values do:) or (aRunArray runsAndValuesDo:)


Best,
Christoph



--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html




Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Nicolas Cellier
In reply to this post by Jakob Reschke
Hi Jakob,


Le mer. 1 janv. 2020 à 20:19, Jakob Reschke <[hidden email]> a écrit :
Hi Nicolas,

Am Mi., 1. Jan. 2020 um 19:48 Uhr schrieb Thiede, Christoph <[hidden email]>:

i := 0.
((RunArray new: 4 withAll: 42) collect: [:x | i := i + 1]) asArray. "#(1 1 1 1)""
This one is broken, isn't it? A RunArray is still a collection of elements and the comment of collect: in Collection states:

Evaluate aBlock with each of the receiver's elements as the argument.  
Collect the resulting values into a collection like the receiver. Answer  
the new collection. [emphasis mine]

Yes, indeed, this is a slight shift of semantics.

But RunArray are essentially unused (just to store Text attributes).
They make sense only for optimization.
So they can have their own semantic, bended to their own purpose.

It's just that we need to comment a bit more those methods.
If we don't want this behavior, then we probably don't want to use a RunArray
 
IMO the storage optimization of RunArray should not affect the meaning of its operations. The intermediary RunArray in the code above should also contain 1, 2, 3, 4, even though it defeats the optimization.

We have different classes with different trade offs and different behaviors.
So, if possible, a common behavior is

Introduce a new selector for the optimized collect operation?

I have used such RunArray extensively in VW for storing time series with many repeated values.
In case when the ratio of repetition was too low, I used raw Array, because RunArray would just spoil space and time.
Introducing a different selector would mean implementing it in other collection too.

So, yes, that would be a possible way, but I don't think that's the optimal way.

Kind regards,
Jakob



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Jakob Reschke
Am Mi., 1. Jan. 2020 um 20:44 Uhr schrieb Nicolas Cellier <[hidden email]>:
Yes, indeed, this is a slight shift of semantics.

One could also call it not implementing the interface properly or breaking the contract ;-)
 
But RunArray are essentially unused (just to store Text attributes).
They make sense only for optimization.
So they can have their own semantic, bended to their own purpose.

Disregarding interfaces and contracts, yes. But it may surprise someone in the future.
 
It's just that we need to comment a bit more those methods.

I brought this up during a naming discussion in VMMaker some time ago: some people don't read the documentation and just rely on their expectations. It might be foolish to do so, but expectations can be justified and allow you to work more efficiently. I wouldn't like to re-check the implementation of #collect: for every new Collection subclass that comes along, in particular if I don't want to look it its implementation.
 
If we don't want this behavior, then we probably don't want to use a RunArray

Sometimes you don't know what kind of collection you get to #collect: from.
 
Introducing a different selector would mean implementing it in other collection too.

Not necessarily. Set also does not understand #before:, you must use a SequenceableCollection if you want it. Conversely if you knowingly use a RunArray (as in Text) and you want to exploit the optimization for the iteration blocks, you could use the new selector(s).


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Nicolas Cellier


Le mer. 1 janv. 2020 à 21:02, Jakob Reschke <[hidden email]> a écrit :
Am Mi., 1. Jan. 2020 um 20:44 Uhr schrieb Nicolas Cellier <[hidden email]>:
Yes, indeed, this is a slight shift of semantics.

One could also call it not implementing the interface properly or breaking the contract ;-)
 
But RunArray are essentially unused (just to store Text attributes).
They make sense only for optimization.
So they can have their own semantic, bended to their own purpose.

Disregarding interfaces and contracts, yes. But it may surprise someone in the future.
 
It's just that we need to comment a bit more those methods.

I brought this up during a naming discussion in VMMaker some time ago: some people don't read the documentation and just rely on their expectations. It might be foolish to do so, but expectations can be justified and allow you to work more efficiently. I wouldn't like to re-check the implementation of #collect: for every new Collection subclass that comes along, in particular if I don't want to look it its implementation.
 
If we don't want this behavior, then we probably don't want to use a RunArray

Sometimes you don't know what kind of collection you get to #collect: from.
 
Introducing a different selector would mean implementing it in other collection too.

Not necessarily. Set also does not understand #before:, you must use a SequenceableCollection if you want it. Conversely if you knowingly use a RunArray (as in Text) and you want to exploit the optimization for the iteration blocks, you could use the new selector(s).

In my case, the client class had no idea whether a RunArray would be used as storage or not.
So that would mean either a beautiful isSomething query, or a common selector for all the possible storage class.

I could live with practically no side effects in those blocks.
It's just a matter of thinking at a different level, not at implementation level.
When we abuse those side effects, we let the implementation leaking.
It's like we want to do: and collect: in the same loop.
The proper way to write the example given by Christoph would be something like:

    (1 to: runArray size) with: runArray collect: [:i :v | i].

Using a block with side effects just is IMPLEMENTATION defined.
For example, if you do that with an Unordered collection, you'll get a different result too with #collect:as: and (#collect:)#as: too.
That's not unexpected. Enumeration order is implementation defined anyway.
Even if Ordered, nothing prevents a BTree implementation to use a different enumeration order too.

IMO, the fact that current phrasing explicitely tells about iterating on each elements does not have to be taken too literally.
It means that the block argument is applied on a single element at a time.

Try writing the comment without that contract in mind:
    "Answer a collection like the receiver whose elements are transformed using aBlock"

It's not super accurate.
If we want to be more accurate and express that aBlock will be evaluated with receiver elements, one at a time, we end up with describing the implementation rather than the semantics:
    "Evaluate the block with each element of the receiver as argument"

Indeed, explaining how is often a way to make understand the upper level concept...
This get over-interpreted, and we have put too much expectations!
IMO, it was not the spirit of the original contract :)

IMO we have to unlearn some of our biased expectations (and of course this includes myself), we have been abusing those side effects for too long.

Anyway, thanks for raising this up, that's exactly what I wanted to discuss.
If we're not ready yet, this can wait.


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Jakob Reschke
Am Mi., 1. Jan. 2020 um 21:55 Uhr schrieb Nicolas Cellier <[hidden email]>:
Le mer. 1 janv. 2020 à 21:02, Jakob Reschke <[hidden email]> a écrit :
Am Mi., 1. Jan. 2020 um 20:44 Uhr schrieb Nicolas Cellier <[hidden email]>:
 
Introducing a different selector would mean implementing it in other collection too.

Not necessarily. Set also does not understand #before:, you must use a SequenceableCollection if you want it. Conversely if you knowingly use a RunArray (as in Text) and you want to exploit the optimization for the iteration blocks, you could use the new selector(s).

In my case, the client class had no idea whether a RunArray would be used as storage or not.
So that would mean either a beautiful isSomething query, or a common selector for all the possible storage class.

Ok but then you would still get away with

Collection>>collectWithoutSideEffects: aBlock
    "Transform my elements using aBlock, which is only guaranteed to be evaluated at least once per distinct element."
    ^ self collect: aBlock "implementation detail"

and specializing that for RunArray.

It's like we want to do: and collect: in the same loop.
The proper way to write the example given by Christoph would be something like:

    (1 to: runArray size) with: runArray collect: [:i :v | i].

Or rather just (1 to: runArray size), but that is beside the point.
The question is whether we have such an answer for all uses of collect:-with-side-effects. It is hard to answer unless we can enumerate all possible uses of collect:-with-side-effects. :-)
 
For example, if you do that with an Unordered collection, you'll get a different result too with #collect:as: and (#collect:)#as: too.
That's not unexpected. Enumeration order is implementation defined anyway.

Yes, but in addition to that difference, in Christoph's example, the intermediary RunArray (before #as) is also wrong (or unexpected) in my opinion, not just the end result.
 
IMO, the fact that current phrasing explicitely tells about iterating on each elements does not have to be taken too literally.
It means that the block argument is applied on a single element at a time.

Try writing the comment without that contract in mind:
    "Answer a collection like the receiver whose elements are transformed using aBlock"


Would that still include that the result collection is still of the same size as the receiver collection? This I consider important. And it somehow implies to me that the elements are transformed one by one because the block also only gets one element for each invocation.
 
IMO, it was not the spirit of the original contract :)

For what it's worth, the Blue Book states: "Evaluate the argument, aBlock, for each of the receiver's elements. Answer a new collection like that of the receiver containing the values returned by the block on each evaluation." (p. 137).
The subsequent example, contrary to the one provided by Christoph, makes use of the block argument, of course: "The resulting collection is the same size as [the original collection]. Each of the elements of the new collection is the [collected property] of the corresponding elements of [the original collection]." (p. 138).

The ANSI draft v1.9 states: "Answer a new collection constructed by gathering the results of evaluating [the block] with each element of the receiver. [...] For each element of the receiver, [the block] is evaluated with the element as the parameter." (p. 160) The refinements for subtypes are about Dictionary keys (p. 168), nil in hashed collections (pp. 178, 180), removed duplicates in Set (p. 180), and returning a different collection type in the cases of Interval (p. 194) and SortedCollection (p. 215).

So I think the new RunArray implementation deviates at least from the standard draft or would introduce a weighty refinement.
 
IMO we have to unlearn some of our biased expectations (and of course this includes myself), we have been abusing those side effects for too long.

Maybe, and it sure is purer without side-effects. But in the end Smalltalk blocks explicitly support side-effects. So I am wary to forbid them in an operation as general as #collect:. Although "forbid" is the wrong word, since one can still use side-effects, only not rely on them to behave equally under all circumstances.

I would favor the least surprising behavior rather than the most efficient if they contradict each other.


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Nicolas Cellier


Le mer. 1 janv. 2020 à 23:11, Jakob Reschke <[hidden email]> a écrit :
Am Mi., 1. Jan. 2020 um 21:55 Uhr schrieb Nicolas Cellier <[hidden email]>:
Le mer. 1 janv. 2020 à 21:02, Jakob Reschke <[hidden email]> a écrit :
Am Mi., 1. Jan. 2020 um 20:44 Uhr schrieb Nicolas Cellier <[hidden email]>:
 
Introducing a different selector would mean implementing it in other collection too.

Not necessarily. Set also does not understand #before:, you must use a SequenceableCollection if you want it. Conversely if you knowingly use a RunArray (as in Text) and you want to exploit the optimization for the iteration blocks, you could use the new selector(s).

In my case, the client class had no idea whether a RunArray would be used as storage or not.
So that would mean either a beautiful isSomething query, or a common selector for all the possible storage class.

Ok but then you would still get away with

Collection>>collectWithoutSideEffects: aBlock
    "Transform my elements using aBlock, which is only guaranteed to be evaluated at least once per distinct element."
    ^ self collect: aBlock "implementation detail"

and specializing that for RunArray.


On the contrary, I would implement collectEachAndEvery: if someone depending on side effects really insist on legacy behavior.
RunArray is unused (Text handling does not depend on such side effect) and new definition is not going to cause any compatibility problem.

It's like we want to do: and collect: in the same loop.
The proper way to write the example given by Christoph would be something like:

    (1 to: runArray size) with: runArray collect: [:i :v | i].

Or rather just (1 to: runArray size), but that is beside the point.
The question is whether we have such an answer for all uses of collect:-with-side-effects. It is hard to answer unless we can enumerate all possible uses of collect:-with-side-effects. :-)
 
Well, we can detect block closing over outer variables and writing into them, but that's a restrictive notion of side effects.
But YAGNI, RunArray usage is very limited as said above.

For example, if you do that with an Unordered collection, you'll get a different result too with #collect:as: and (#collect:)#as: too.
That's not unexpected. Enumeration order is implementation defined anyway.

Yes, but in addition to that difference, in Christoph's example, the intermediary RunArray (before #as) is also wrong (or unexpected) in my opinion, not just the end result.
 
Yes.
Because we are thinking implementation-wise, like the comment has taught us to do.
That is what must be unlearnt.


IMO, the fact that current phrasing explicitely tells about iterating on each elements does not have to be taken too literally.
It means that the block argument is applied on a single element at a time.

Try writing the comment without that contract in mind:
    "Answer a collection like the receiver whose elements are transformed using aBlock"


Would that still include that the result collection is still of the same size as the receiver collection? This I consider important. And it somehow implies to me that the elements are transformed one by one because the block also only gets one element for each invocation.

Set collect: does not preserve size.
RunArray collect: does, new definition included.
The fact that it can collect a whole run at a time is just an implementation detail.

 
IMO, it was not the spirit of the original contract :)

For what it's worth, the Blue Book states: "Evaluate the argument, aBlock, for each of the receiver's elements. Answer a new collection like that of the receiver containing the values returned by the block on each evaluation." (p. 137).
The subsequent example, contrary to the one provided by Christoph, makes use of the block argument, of course: "The resulting collection is the same size as [the original collection]. Each of the elements of the new collection is the [collected property] of the corresponding elements of [the original collection]." (p. 138).

The ANSI draft v1.9 states: "Answer a new collection constructed by gathering the results of evaluating [the block] with each element of the receiver. [...] For each element of the receiver, [the block] is evaluated with the element as the parameter." (p. 160) The refinements for subtypes are about Dictionary keys (p. 168), nil in hashed collections (pp. 178, 180), removed duplicates in Set (p. 180), and returning a different collection type in the cases of Interval (p. 194) and SortedCollection (p. 215).
 
So I think the new RunArray implementation deviates at least from the standard draft or would introduce a weighty refinement.
 
IMO we have to unlearn some of our biased expectations (and of course this includes myself), we have been abusing those side effects for too long.

Maybe, and it sure is purer without side-effects. But in the end Smalltalk blocks explicitly support side-effects. So I am wary to forbid them in an operation as general as #collect:. Although "forbid" is the wrong word, since one can still use side-effects, only not rely on them to behave equally under all circumstances.

We don't forbid side effects.
Just warn that using side effects on implementation defined enumeration will result in implementation defined behavior.

I would favor the least surprising behavior rather than the most efficient if they contradict each other.

I understand.
But RunArray is all about optimization. It is its very logic to evaluate blocks once per run.
If we strictly adhere to this interpretation of method comment, detectMin: detectMax: detect:ifNone: and count: should also iterate on each and every element.
Do you agree that count: hardly benefits from such definition?


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Jakob Reschke
Am Do., 2. Jan. 2020 um 21:12 Uhr schrieb Nicolas Cellier <[hidden email]>:
If we strictly adhere to this interpretation of method comment, detectMin: detectMax: detect:ifNone: and count: should also iterate on each and every element.
Do you agree that count: hardly benefits from such definition?

Instinctively I wanted to reply that the optimization in RunArray does not change the result of these operations contrary to collect: because they do not change elements, so equal elements could not suddenly become non-equal elements during the operation. But it got me thinking that I cannot imagine these usefully with side effects in the blocks... :-)

What about do: and inject:into: instead? Would you evaluate their blocks only once per run?


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Nicolas Cellier


Le jeu. 2 janv. 2020 à 22:09, Jakob Reschke <[hidden email]> a écrit :
Am Do., 2. Jan. 2020 um 21:12 Uhr schrieb Nicolas Cellier <[hidden email]>:
If we strictly adhere to this interpretation of method comment, detectMin: detectMax: detect:ifNone: and count: should also iterate on each and every element.
Do you agree that count: hardly benefits from such definition?

Instinctively I wanted to reply that the optimization in RunArray does not change the result of these operations contrary to collect: because they do not change elements, so equal elements could not suddenly become non-equal elements during the operation. But it got me thinking that I cannot imagine these usefully with side effects in the blocks... :-)

What about do: and inject:into: instead? Would you evaluate their blocks only once per run?

No.
It would mean changing the semantics.
I don't want to change the semantics, just the implementation.
We have other selectors available for RunArray specific semantic, like aRunArray values do:, aRunArray runsAndValuesDo:

Side effects in collect: and select: (and count: etc...) are border line.
Either we expect that those methods must enumerate exactly like do: in order to support side effects uniformely.
Your reaction perfectly  demonstrates that this can be a legitimate expectation.
Or we consider that the enumeration is implementation defined, as long as the methods maps, filter, (count etc...).
I think that RunArray has a greater value in itself if we adopt the later, but it's only my own opinion so far.



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

Christoph Thiede

We have other selectors available for RunArray specific semantic, like aRunArray values do:, aRunArray runsAndValuesDo:


Side note: If you see any real use case for [aRunArray values do:], please make a forwarding selector #valuesDo: for it. I really don't want to encourage LoD violations. Alternatively, provide #runsAndValues as a dictionary? Hm ...

Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Donnerstag, 2. Januar 2020 22:45:13
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] The Inbox: Collections-nice.869.mcz
 


Le jeu. 2 janv. 2020 à 22:09, Jakob Reschke <[hidden email]> a écrit :
Am Do., 2. Jan. 2020 um 21:12 Uhr schrieb Nicolas Cellier <[hidden email]>:
If we strictly adhere to this interpretation of method comment, detectMin: detectMax: detect:ifNone: and count: should also iterate on each and every element.
Do you agree that count: hardly benefits from such definition?

Instinctively I wanted to reply that the optimization in RunArray does not change the result of these operations contrary to collect: because they do not change elements, so equal elements could not suddenly become non-equal elements during the operation. But it got me thinking that I cannot imagine these usefully with side effects in the blocks... :-)

What about do: and inject:into: instead? Would you evaluate their blocks only once per run?

No.
It would mean changing the semantics.
I don't want to change the semantics, just the implementation.
We have other selectors available for RunArray specific semantic, like aRunArray values do:, aRunArray runsAndValuesDo:

Side effects in collect: and select: (and count: etc...) are border line.
Either we expect that those methods must enumerate exactly like do: in order to support side effects uniformely.
Your reaction perfectly  demonstrates that this can be a legitimate expectation.
Or we consider that the enumeration is implementation defined, as long as the methods maps, filter, (count etc...).
I think that RunArray has a greater value in itself if we adopt the later, but it's only my own opinion so far.



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-nice.869.mcz

marcel.taeumel
In reply to this post by commits-2
Hi all!

I am in favor of improving RunArray. Treating RunArray as "array with
run-length encoding" breaks information hiding by expecting a specific
implementation strategy.

RunArray feels like Bag, but ordered. Maybe "Collections-Abstract" deserves
a new direct subclass? CompressedCollection? And then Bag and OrderedBag are
two examples for such compression?

We have already "Collections-Support", which lists several support classes
such as "Bitset" and "CharacterSet". Such classes do not comply with the
usual collection protocol anyway. For example, there is CharacterSet >>
#canBeEnumerated. Maybe RunArray can go there and also help OrderedBag come
to be? :-) Then, RunArray may also have its special semantics.

Best,
Marcel


commits-2 wrote

> Nicolas Cellier uploaded a new version of Collections to project The
> Inbox:
> http://source.squeak.org/inbox/Collections-nice.869.mcz
>
> ==================== Summary ====================
>
> Name: Collections-nice.869
> Author: nice
> Time: 31 December 2019, 11:14:43.388306 pm
> UUID: e5a9bef5-16d3-4d8a-bf08-040d30e4e907
> Ancestors: Collections-nice.868
>
> Remaster Collections-nice.464: opimize RunArray
>
> This should not be noticeable for Text, but as a general library, it's
> important for any other potential use.
>
> Among those optimizations, notice one important change: some of the
> enumerating methods like #collect: won't iterate on each element, but only
> once per run. Be aware that providing blocks with side effects like (i :=
> i + 1) might lead to different behavior than ordinary ArrayedCollection.
>
> Move RunArray off ArrayedCollection which serves nothing to such subclass.
>
> Add the ability to remove: since it already has the hability to addFirst:
> and addLast:
>
> Fix a few missing lastIndex cache flush, and advertise about the necessity
> to do it in class comment.
>
> Deprecate mapValues: to the profit of replace:.
>
> =============== Diff against Collections-nice.868 ===============
>
> Item was changed:
> + SequenceableCollection subclass: #RunArray
> - ArrayedCollection subclass: #RunArray
>   instanceVariableNames: 'runs values lastIndex lastRun lastOffset'
>   classVariableNames: ''
>   poolDictionaries: ''
>   category: 'Collections-Arrayed'!
>  
> + !RunArray commentStamp: 'nice 12/30/2019 00:57' prior: 0!
> - !RunArray commentStamp: '
> <historical>
> ' prior: 0!
>   My instances provide space-efficient storage of data which tends to be
> constant over long runs of the possible indices. Essentially repeated
> values are stored singly and then associated with a "run" length that
> denotes the number of consecutive occurrences of the value.
>  
>   My two important variables are
>   runs An array of how many elements are in each run
>   values An array of what the value is over those elements
>  
>   The variables lastIndex, lastRun and lastOffset cache the last access
>   so that streaming through RunArrays is not an N-squared process.
> + Beware: methods modifying the RunArray contents should reset the
> lastIndex cache to nil.
>  
>   Many complexities of access can be bypassed by using the method
>   RunArray withStartStopAndValueDo:!
>
> Item was changed:
>   ----- Method: RunArray class>>newFrom: (in category 'instance creation')
> -----
>   newFrom: aCollection
>   "Answer an instance of me containing the same elements as aCollection."
>  
>   | newCollection |
>   newCollection := self new.
> + newCollection fillFrom: aCollection with: [:each | each].
> - aCollection do: [:x | newCollection addLast: x].
>   ^newCollection
>  
>   " RunArray newFrom: {1. 2. 2. 3}
>   {1. $a. $a. 3} as: RunArray
>   ({1. $a. $a. 3} as: RunArray) values
>   "!
>
> Item was added:
> + ----- Method: RunArray>>allSatisfy: (in category 'enumerating') -----
> + allSatisfy: aBlock
> + "Only evaluate once per run"
> +
> + ^values allSatisfy: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>anySatisfy: (in category 'enumerating') -----
> + anySatisfy: aBlock
> + "Only evaluate once per run"
> +
> + ^values anySatisfy: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>asBag (in category 'converting') -----
> + asBag
> + | aBag |
> + aBag := Bag new: values size.
> + self runsAndValuesDo: [:run :value |
> + aBag add: value withOccurrences: run].
> + ^aBag!
>
> Item was added:
> + ----- Method: RunArray>>asSet (in category 'converting') -----
> + asSet
> + ^values asSet!
>
> Item was changed:
> + ----- Method: RunArray>>coalesce (in category 'private') -----
> - ----- Method: RunArray>>coalesce (in category 'adding') -----
>   coalesce
> + "Coalesce theRuns and theValues if ever the values have adjacent equal
> objects"
> +
> + | lastLength lastValue mustCoalesce coalescedRuns coalescedValued
> runIndex |
> + mustCoalesce := false.
> + runIndex := 0.
> + lastLength := 0.
> + lastValue := Object new.
> + runs with: values do: [:run :value |
> + (lastValue = value or: [run = 0])
> + ifTrue:
> + [mustCoalesce
> + ifFalse:
> + [coalescedRuns := (Array new: runs size) writeStream.
> + coalescedValued := (Array new: values size) writeStream.
> + coalescedRuns next: runIndex putAll: runs startingAt: 1.
> + coalescedValued next: runIndex putAll: values startingAt: 1.
> + mustCoalesce := true].
> + lastLength := lastLength + run]
> + ifFalse:
> + [lastLength > 0
> + ifTrue:
> + [mustCoalesce
> + ifTrue:
> + [coalescedRuns nextPut: lastLength.
> + coalescedValued nextPut: lastValue].
> + runIndex := runIndex + 1].
> + lastLength := run.
> + lastValue := value]].
> + mustCoalesce
> + ifTrue:
> + [lastLength > 0
> + ifTrue:
> + [coalescedRuns nextPut: lastLength.
> + coalescedValued nextPut: lastValue].
> + self setRuns: coalescedRuns contents setValues: coalescedValued
> contents]!
> - "Try to combine adjacent runs"
> - | ind |
> - ind := 2.
> - [ind > values size] whileFalse: [
> - (values at: ind-1) = (values at: ind)
> - ifFalse: [ind := ind + 1]
> - ifTrue: ["two are the same, combine them"
> - values := values copyReplaceFrom: ind to: ind with: #().
> - runs at: ind-1 put: (runs at: ind-1) + (runs at: ind).
> - runs := runs copyReplaceFrom: ind to: ind with: #().
> - "self error: 'needed to combine runs' "]].
> - !
>
> Item was added:
> + ----- Method: RunArray>>collect: (in category 'enumerating') -----
> + collect: aBlock
> + "Beware, the block will be evaluated only once per group of values."
> + ^(self class runs: runs copy values: (values collect: aBlock))
> coalesce!
>
> Item was added:
> + ----- Method: RunArray>>copyUpThrough: (in category 'copying') -----
> + copyUpThrough: value
> + "Optimized"
> +
> + | newSize newValues newRuns |
> + newSize := values indexOf: value startingAt: 1.
> + newSize = 0 ifTrue: [^self copy].
> + newRuns := runs copyFrom: 1 to: newSize.
> + newRuns at: newSize put: 1.
> + newValues := values copyFrom: 1 to: newSize.
> + ^ self class
> + runs: newRuns
> + values: newValues!
>
> Item was added:
> + ----- Method: RunArray>>copyUpTo: (in category 'copying') -----
> + copyUpTo: anElement
> + "Optimized"
> +
> + | newValues |
> + newValues := values copyUpTo: anElement.
> + ^ self class
> + runs: (runs copyFrom: 1 to: newValues size)
> + values: newValues!
>
> Item was added:
> + ----- Method: RunArray>>copyUpToLast: (in category 'copying') -----
> + copyUpToLast: value
> + "Optimized"
> +
> + | newSize run newRuns newValues |
> + newSize := values lastIndexOf: value startingAt: values size.
> + newSize = 0 ifTrue: [^self copy].
> + run := runs at: newSize.
> + run > 1
> + ifTrue:
> + [newRuns := runs copyFrom: 1 to: newSize.
> + newRuns at: newSize put: run - 1]
> + ifFalse:
> + [newSize := newSize - 1.
> + newRuns := runs copyFrom: 1 to: newSize].
> + newValues := values copyFrom: 1 to: newSize.
> + ^ self class
> + runs: newRuns
> + values: newValues!
>
> Item was added:
> + ----- Method: RunArray>>count: (in category 'enumerating') -----
> + count: aBlock
> + "Beware, the block will be evaluated only once per group of values."
> + | count |
> + count := 0.
> + self runsAndValuesDo: [:run :value |
> + (aBlock value: value)
> + ifTrue:
> + [count := count + run]].
> + ^count!
>
> Item was added:
> + ----- Method: RunArray>>detect:ifNone: (in category 'enumerating') -----
> + detect: aBlock ifNone: exceptionBlock
> + "Only evaluate once per run"
> +
> + ^values detect: aBlock ifNone: exceptionBlock !
>
> Item was added:
> + ----- Method: RunArray>>detectMax: (in category 'enumerating') -----
> + detectMax: aBlock
> + "Only evaluate once per run"
> +
> + ^values detectMax: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>detectMin: (in category 'enumerating') -----
> + detectMin: aBlock
> + "Only evaluate once per run"
> +
> + ^values detectMin: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>detectSum: (in category 'enumerating') -----
> + detectSum: aBlock
> + "Only loop on runs"
> + | sum |
> + sum := 0.
> + self runsAndValuesDo: [:run :value |
> + sum := (aBlock value: value) * run + sum].  
> + ^ sum!
>
> Item was added:
> + ----- Method: RunArray>>do: (in category 'enumerating') -----
> + do: aBlock
> + "This is refined for speed"
> +
> + 1 to: runs size do: [:i |
> + | r v |
> + v := values at: i.
> + r := runs at: i.
> + [( r := r - 1) >= 0]
> + whileTrue: [aBlock value: v]].!
>
> Item was added:
> + ----- Method: RunArray>>fillFrom:with: (in category 'private') -----
> + fillFrom: aCollection with: aBlock
> + "Evaluate aBlock with each of aCollections's elements as the argument.  
> + Collect the resulting values into self. Answer self."
> +
> + | newRuns newValues lastLength lastValue |
> + newRuns := (Array new: aCollection size) writeStream.
> + newValues := (Array new: aCollection size) writeStream.
> + lastLength := 0.
> + lastValue := Object new.
> + lastIndex := nil.  "flush access cache"
> + aCollection do: [:each |
> + | value |
> + value := aBlock value: each.
> + lastValue = value
> + ifTrue: [lastLength := lastLength + 1]
> + ifFalse:
> + [lastLength > 0
> + ifTrue:
> + [newRuns nextPut: lastLength.
> + newValues nextPut: lastValue].
> + lastLength := 1.
> + lastValue := value]].
> + lastLength > 0
> + ifTrue:
> + [newRuns nextPut: lastLength.
> + newValues nextPut: lastValue].
> + self setRuns: newRuns contents setValues: newValues contents!
>
> Item was added:
> + ----- Method: RunArray>>findFirst: (in category 'enumerating') -----
> + findFirst: aBlock
> + | index |
> + index := 1.
> + self runsAndValuesDo: [ :run :value |
> + (aBlock value: value) ifTrue: [^index].
> + index := index + run].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>findLast: (in category 'enumerating') -----
> + findLast: aBlock
> + | index |
> + index := values size + 1.
> + [(index := index - 1) >= 1] whileTrue:
> + [(aBlock value: (values at: index)) ifTrue: [^(1 to: index) detectSum:
> [:i | runs at: i]]].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>includes: (in category 'testing') -----
> + includes: anObject
> + "Answer whether anObject is one of the receiver's elements."
> +
> + ^values includes: anObject!
>
> Item was added:
> + ----- Method: RunArray>>indexOf:startingAt: (in category 'accessing')
> -----
> + indexOf: anElement startingAt: start
> + "Answer the index of the first occurence of anElement after start
> + within the receiver. If the receiver does not contain anElement,
> + answer 0."
> +
> + | index |
> + index := 1.
> + self runsAndValuesDo: [ :run :value |
> + (index >= start and: [value = anElement]) ifTrue: [^index].
> + index := index + run].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>indexOfAnyOf:startingAt: (in category
> 'accessing') -----
> + indexOfAnyOf: aCollection startingAt: start
> + "Answer the index of the first occurence of any element included in
> aCollection after start within the receiver.
> + If the receiver does not contain anElement, answer zero, which is an
> invalid index."
> +
> + | index |
> + index := 1.
> + self runsAndValuesDo: [ :run :value |
> + (index >= start and: [aCollection includes: value]) ifTrue: [^index].
> + index := index + run].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>isSorted (in category 'testing') -----
> + isSorted
> + ^values isSorted!
>
> Item was added:
> + ----- Method: RunArray>>isSortedBy: (in category 'testing') -----
> + isSortedBy: aBlock
> + ^values isSortedBy: aBlock!
>
> Item was added:
> + ----- Method: RunArray>>lastIndexOf:startingAt: (in category
> 'accessing') -----
> + lastIndexOf: anElement startingAt: lastIndex
> + "Answer the index of the last occurence of anElement within the  
> + receiver. If the receiver does not contain anElement, answer 0."
> +
> + | lastValueIndex |
> + lastValueIndex := values lastIndexOf: anElement startingAt: values
> size.
> + [lastValueIndex > 0] whileTrue:
> + [| i index |
> + i := index := 0.
> + [index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
> + whileTrue: [index := index + (runs at: i)].
> + index <= lastIndex ifTrue: [^index].
> + index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
> + lastValueIndex := values lastIndexOf: anElement startingAt:
> lastValueIndex - 1].
> + ^0!
>
> Item was added:
> + ----- Method: RunArray>>lastIndexOfAnyOf:startingAt: (in category
> 'accessing') -----
> + lastIndexOfAnyOf: aCollection startingAt: lastIndex
> + "Answer the index of the last occurence of any element of aCollection
> + within the receiver. If the receiver does not contain any of those
> + elements, answer 0"
> +
> + | lastValueIndex |
> + lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt:
> values size.
> + [lastValueIndex > 0] whileTrue:
> + [| i index |
> + i := index := 0.
> + [index <= lastIndex and: [(i := i + 1) <= lastValueIndex]]
> + whileTrue: [index := index + (runs at: i)].
> + index <= lastIndex ifTrue: [^index].
> + index - (runs at: lastValueIndex) < lastIndex ifTrue: [^lastIndex].
> + lastValueIndex := values lastIndexOfAnyOf: aCollection startingAt:
> lastValueIndex - 1].
> + ^0!
>
> Item was removed:
> - ----- Method: RunArray>>mapValues: (in category 'private') -----
> - mapValues: mapBlock
> - "NOTE: only meaningful to an entire set of runs"
> -
> - values := values collect: mapBlock!
>
> Item was added:
> + ----- Method: RunArray>>noneSatisfy: (in category 'enumerating') -----
> + noneSatisfy: aBlock
> + "Only evaluate once per run"
> +
> + ^values noneSatisfy: aBlock!
>
> Item was changed:
> + ----- Method: RunArray>>rangeOf:startingAt: (in category 'accessing')
> -----
> - ----- Method: RunArray>>rangeOf:startingAt: (in category 'adding') -----
>   rangeOf: attr startingAt: startPos
>   "Answer an interval that gives the range of attr at index position
> startPos. An empty interval with start value startPos is returned when the
> attribute attr is not present at position startPos.  self size > 0 is
> assumed, it is the responsibility of the caller to test for emptiness of
> self.
>   Note that an attribute may span several adjancent runs. "
>  
>   self at: startPos
>   setRunOffsetAndValue:
>               [:run :offset :value |
>                  ^(value includes: attr)
>                     ifFalse: [startPos to: startPos - 1]
>                     ifTrue:
>                       [ | firstRelevantPosition lastRelevantPosition
> idxOfCandidateRun |
>                        lastRelevantPosition := startPos - offset + (runs
> at: run) - 1.
>                        firstRelevantPosition := startPos - offset.
>                        idxOfCandidateRun := run + 1.
>                        [idxOfCandidateRun <= runs size
>                                and: [(values at: idxOfCandidateRun)
> includes: attr]]
>                           whileTrue:
>                             [lastRelevantPosition := lastRelevantPosition
> + (runs at: idxOfCandidateRun).
>                              idxOfCandidateRun := idxOfCandidateRun + 1].
>                        idxOfCandidateRun := run - 1.
>                        [idxOfCandidateRun >= 1
>                                and: [(values at: idxOfCandidateRun)
> includes: attr]]
>                           whileTrue:
>                             [firstRelevantPosition :=
> firstRelevantPosition - (runs at: idxOfCandidateRun).
>                              idxOfCandidateRun := idxOfCandidateRun - 1].
>    
>                       firstRelevantPosition to: lastRelevantPosition]
>    ]!
>
> Item was added:
> + ----- Method: RunArray>>remove:ifAbsent: (in category 'removing') -----
> + remove: anObject ifAbsent: exceptionBlock
> + | index mustCoalesce run |
> + lastIndex := nil.  "flush access cache"
> + index := values indexOf: anObject ifAbsent: [^exceptionBlock value].
> + (run := runs at: index) > 1
> + ifTrue: [runs at: index put: run - 1]
> + ifFalse:
> + [mustCoalesce := index > 1 and: [index < values size and: [(values
> at: index - 1) = (values at: index + 1)]].
> + runs := runs copyWithoutIndex: index.
> + values := values copyWithoutIndex: index.
> + mustCoalesce
> + ifTrue:
> + [runs at: index - 1 put: (runs at: index - 1) + (runs at: index).
> + runs := runs copyWithoutIndex: index.
> + values := values copyWithoutIndex: index]].
> + ^anObject!
>
> Item was added:
> + ----- Method: RunArray>>removeAll (in category 'removing') -----
> + removeAll
> + lastIndex := nil.  "flush access cache"
> + runs := Array new.
> + values := Array new!
>
> Item was added:
> + ----- Method: RunArray>>replace: (in category 'enumerating') -----
> + replace: aBlock
> + "destructively replace the values in this RunArray with the ones
> transformed by aBlock."
> + lastIndex := nil.  "flush access cache"
> + values := values replace: aBlock.
> + self coalesce!
>
> Item was added:
> + ----- Method: RunArray>>reverseDo: (in category 'enumerating') -----
> + reverseDo: aBlock
> + "This is refined for speed"
> +
> + | i |
> + i := runs size.
> + [i > 0]
> + whileTrue:
> + [ | r v |
> + v := values at: i.
> + r := runs at: i.
> + i := i - 1.
> + [( r := r - 1) >= 0]
> + whileTrue: [aBlock value: v]].!
>
> Item was added:
> + ----- Method: RunArray>>select: (in category 'enumerating') -----
> + select: aBlock
> + "Beware, the block will be evaluated only once per group of values."
> + | newRuns newValues |
> + newRuns := (Array new: runs size) writeStream.
> + newValues := (Array new: values size) writeStream.
> + self runsAndValuesDo: [:run :value |
> + (aBlock value: value)
> + ifTrue:
> + [newRuns nextPut: run.
> + newValues nextPut: value]].
> + ^(self class runs: newRuns contents values: newValues contents)
> coalesce!
>
> Item was changed:
>   ----- Method: SequenceableCollection>>lastIndexOf:startingAt: (in
> category 'accessing') -----
>   lastIndexOf: anElement startingAt: lastIndex
>   "Answer the index of the last occurence of anElement within the  
> + receiver. If the receiver does not contain anElement, answer 0."
> - receiver. If the receiver does not contain anElement, answer the
> - result of evaluating the argument, exceptionBlock."
>  
>   lastIndex to: 1 by: -1 do: [ :index |
>   (self at: index) = anElement ifTrue: [ ^index ] ].
>   ^0!
>
> Item was changed:
>   ----- Method: Text>>addAttribute:from:to: (in category 'emphasis') -----
>   addAttribute: att from: start to: stop
>   "Set the attribute for characters in the interval start to stop."
>   runs :=  runs copyReplaceFrom: start to: stop
>   with: ((runs copyFrom: start to: stop)
> + replace:
> - mapValues:
>   [:attributes | Text addAttribute: att toArray: attributes])
>   !
>
> Item was changed:
>   ----- Method: Text>>removeAttribute:from:to: (in category 'emphasis')
> -----
>   removeAttribute: att from: start to: stop
>   "Remove the attribute over the interval start to stop."
>   runs :=  runs copyReplaceFrom: start to: stop
>   with: ((runs copyFrom: start to: stop)
> + replace:
> - mapValues:
>   [:attributes | attributes copyWithout: att])
>   !





--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html