Hi,
while browsing the Containers sources, I discovered a performance leak. SortedCollection>>includes: does a _linear_ search on a *sorted* collection. We all(?) know however, that this can be done in O(log(n)) time complexity using a binary search algorithm. I'm aware that in Smalltalk the simple algorithm is not applicable, because the user can define her/his own SortBlock. But while preserving the sort property invariant, (i)(a1 <= a2 <= ...<= an) respectively (ii)(an <= ... <= a2 <= a1), for ascending/descending sorted collections, we can use this modified version, which determines with a simple test, whether the collection is in ascending/descending order. a1 <= an --> ascending, assuming sort property (i) holds (inductive closure) an <= a1 --> descending, dto. for (ii) Furthermore, a collection is sortable, iff the elements class provides a "<=" relation. So, if our collection holds the sort property invariant *and* our algorithm uses these operators (<,= or <=) only, we can use this algorithm. (q.e.d.) Of course the sort order property can be broken by a pathological user defined SortBlock, but then, we have no *SortedCollection* at all because it simply isn't sorted. Then and only then, the algorithm fails. I think, this is a sufficient formal proof for introducing the SequenceableCollection>>binarySearchFor: method and safely overwriting SortedCollection>>includes: ------------------ SNIP ------------------ !SortedCollection methodsFor! includes: target ^self binarySearchFor: target The well known algorithm, modified for ascending/descending collections: ------------------ SNIP ------------------ !SequenceableCollection methodsFor! binarySearchFor: aKey |l r m found| "wp: receiver has to be sorted" l := 1. r := self size. found := false. (self basicAt: l) <= (self basicAt: r) ifTrue: [ "ascending order" [found not and: [l <= r]] whileTrue:[ m := (l + r) // 2. (found := ((self basicAt: m) = aKey)) ifFalse:[ aKey < (self basicAt: m) ifTrue: [ r := m - 1] ifFalse: [ l := m + 1] ] ] ] ifFalse: [ "descending order" [found not and: [l <= r]] whileTrue:[ m := (l + r) // 2. (found := ((self basicAt: m) = aKey)) ifFalse:[ aKey < (self basicAt: m) ifTrue: [ l := m + 1] ifFalse: [ r := m - 1] ] ] ]. ^found --------------- SNIP ---------------- I believe the examples dont' have to be commented. oc := OrderedCollection new. 100000 timesRepeat:[oc add: (RandomString new: 10)]. sc := oc asSortedCollection. dc := oc asSortedCollection sortBlock: [:sx :sy | sy <= sx]. Time millisecondsToRun:[1000 timesRepeat:[oc includes: oc any]] 4108 Time millisecondsToRun:[1000 timesRepeat:[sc includes: sc any]] 28 Time millisecondsToRun:[1000 timesRepeat:[dc includes: dc any]] 28 (N.B. Collection>>any picks a random element) -------------- SNIP -------------------- After all you might think that this is not important, because Smalltalk has much better methods for "Set representations" than sorted collections. These use hash algorithms, that are O(1), which is of course transcender. Well, that's right, but if/when we already *have* a SortedCollection, then we definitely *should* use a better search operation than linear. Bye Ingo |
>...
> > I believe the examples dont' have to be commented. > > oc := OrderedCollection new. > 100000 timesRepeat:[oc add: (RandomString new: 10)]. > sc := oc asSortedCollection. > dc := oc asSortedCollection sortBlock: [:sx :sy | sy <= sx]. > > > Time millisecondsToRun:[1000 timesRepeat:[oc includes: oc any]] 4108 > Time millisecondsToRun:[1000 timesRepeat:[sc includes: sc any]] 28 > Time millisecondsToRun:[1000 timesRepeat:[dc includes: dc any]] 28 OK, but I think you need a different algorithm: | selectors | selectors := Dictionary new. Object withAllSubclassesDo: [:each | each selectors do: [:sel | selectors at: sel put: (selectors at: sel ifAbsent: [0]) + 1]]. (selectors associations asSortedCollection: [:a :b | a value < b value]) includes: (selectors associationAt: #printOn:) This should return true (and does under Dolphin's implementation), but returns false under yours. John Brant |
> OK, but I think you need a different algorithm:
> > | selectors | > selectors := Dictionary new. > Object withAllSubclassesDo: [:each | > each selectors do: [:sel | > selectors at: sel put: (selectors at: sel ifAbsent: [0]) + 1]]. > (selectors associations asSortedCollection: [:a :b | a value < b value]) > includes: (selectors associationAt: #printOn:) > > > This should return true (and does under Dolphin's implementation), but > returns false under yours. > > > John Brant > John, thank you for this. I was obviously fixed by the formal proof, so I neglected the concrete implementation. My solution works with simple types only, not with compound structures. This can be easily fixed. It introduces however a restriction, that binarySearchFor: has to be placed into SortedCollection, not SequenceableCollection, because it makes use of its sortBlock property. Furthermore this new algorithm assumes that the sort block is expressed in terms of "<" (or "<="). The original assumed, that the elements class implemented "<" . I don't think that this is a problem in praxi. Finally you can decide whether to overide SortedCollection>>includes: or using binarySearchFor: ad hoc. Thanks again Regards Ingo ---------------- SNIP --------------------- !SortedCollection methodsFor! binarySearchFor: aKey |l r m found sb| "wp: receiver has to be sorted, sortBlock is expressed in terms of <= (or <)" found := false. l := 1. r := self size. sb := self sortBlock. (sb value: (self basicAt: l) value: (self basicAt: r)) ifTrue: [ "ascending order" [found not and: [ l <= r]] whileTrue:[ m := (l + r) // 2. (found := ((self basicAt: m) = aKey)) ifFalse:[ (sb value: aKey value: (self basicAt: m)) ifTrue: [ r := m - 1] ifFalse:[ l := m + 1] ] ] ] ifFalse: [ "descending order" [found not and: [ l <= r]] whileTrue:[ m := (l + r) // 2. (found := ((self basicAt: m) = aKey)) ifFalse:[ (sb value: aKey value: (self basicAt: m)) ifTrue: [ l := m + 1] ifFalse:[ r := m - 1] ] ] ]. ^found -------------------- SNIP ------------------------ selectors := Dictionary new. Object withAllSubclassesDo: [:each | each selectors do: [:sel | selectors at: sel put: (selectors at: sel ifAbsent: [0]) + 1]]. (selectors associations asSortedCollection: [:a :b | a value < b value]) includes: (selectors associationAt: #printOn:) true |
"Ingo Blank" <[hidden email]> wrote in message
news:3b1920fa$0$133$[hidden email]... > > > > OK, but I think you need a different algorithm: > > > > | selectors | > > selectors := Dictionary new. > > Object withAllSubclassesDo: [:each | > > each selectors do: [:sel | > > selectors at: sel put: (selectors at: sel ifAbsent: [0]) + 1]]. > > (selectors associations asSortedCollection: [:a :b | a value < b value]) > > includes: (selectors associationAt: #printOn:) > > > > > > This should return true (and does under Dolphin's implementation), but > > returns false under yours. > > I was obviously fixed by the formal proof, so I neglected the concrete > implementation. My solution works with simple types only, not with > structures. > > This can be easily fixed. > > It introduces however a restriction, that binarySearchFor: has to be placed > into SortedCollection, not SequenceableCollection, because it makes use of > its sortBlock property. > > Furthermore this new algorithm assumes that the sort block is expressed in > terms of "<" (or "<="). The original assumed, that the elements class > implemented "<" . I don't think that this is a problem in praxi. Not quite there yet: (#('aaa' 'abb' 'acc') asSortedCollection: [:a :b | a first < b first]) includes: 'aaa' Returns false (at least in my image -- the sorted collection is 'abb', 'acc', 'aaa'). I think the best you can do is find the left and right most elements that is sort block equal to the object your are searching for, and then do a linear search in this range. You can test if something is sort block equal by evaluating the sort block both ways. For example, this should return true if anObject is sort block equal to anotherObject: (sortBlock value: anObject value: anotherObject) = (sortBlock value: anotherObject value: anObject) Also, you should use at: instead of basicAt: -- there may be empty space at the beginning of the list. John Brant |
"John Brant" <[hidden email]> schrieb im Newsbeitrag
news:dDaS6.13784$[hidden email]... > "Ingo Blank" <[hidden email]> wrote in message > news:3b1920fa$0$133$[hidden email]... > > > > > > > OK, but I think you need a different algorithm: > > > > > > | selectors | > > > selectors := Dictionary new. > > > Object withAllSubclassesDo: [:each | > > > each selectors do: [:sel | > > > selectors at: sel put: (selectors at: sel ifAbsent: [0]) + > > > (selectors associations asSortedCollection: [:a :b | a value < b value]) > > > includes: (selectors associationAt: #printOn:) > > > > > > > > > This should return true (and does under Dolphin's implementation), but > > > returns false under yours. > > > > I was obviously fixed by the formal proof, so I neglected the concrete > > implementation. My solution works with simple types only, not with > compound > > structures. > > > > This can be easily fixed. > > > > It introduces however a restriction, that binarySearchFor: has to be > placed > > into SortedCollection, not SequenceableCollection, because it makes use > > its sortBlock property. > > > > Furthermore this new algorithm assumes that the sort block is expressed in > > terms of "<" (or "<="). The original assumed, that the elements class > > implemented "<" . I don't think that this is a problem in praxi. > > Not quite there yet: > > (#('aaa' 'abb' 'acc') asSortedCollection: [:a :b | a first < b first]) > includes: 'aaa' > > Returns false (at least in my image -- the sorted collection is 'abb', > 'acc', 'aaa'). I think the best you can do is find the left and right most > elements that is sort block equal to the object your are searching for, > then do a linear search in this range. You can test if something is sort > block equal by evaluating the sort block both ways. For example, this should > return true if anObject is sort block equal to anotherObject: > (sortBlock value: anObject value: anotherObject) = (sortBlock value: > anotherObject value: anObject) > > Also, you should use at: instead of basicAt: -- there may be empty space at > the beginning of the list. > > > John Brant > John, basicAt: was introduced in OrderedCollection>>includes: for performance, so it must be legal to use it in SortedCollection. The reason why your somewhat tricky, but perfectly legal, example lets fail my algorithm is, that it is undetermined whether the collection is in ascending/descending order. So when it "runs" in the "wrong" direction by accident, it fails. Well, we catch this possibility and try twice, one for ascending order and one for descending. O(log n) is not affected by this. Your tip with sortBlockEqual was very valuable ;-) (#('aaa' 'abb' 'acc') asSortedCollection: [:a :b | a first < b first]) includes: 'aaa' --> true ------------------ SNIP ---------------------------- SortedCollection>>sortBlockEqual: a and: b sortBlockEqual: a and: b |sb| sb := self sortBlock. ^(sb value: a value: b) = (sb value: b value: a) ----------------------------------------------------------------------- SortedCollection>>binarySearchFor: aKey binarySearchFor: aKey |l r m found sb| "wp: receiver has to be sorted, sortBlock is expressed in terms of <= (or <)" found := false. l := 1. r := self size. sb := self sortBlock. (self sortBlockEqual: (self basicAt: l) and: (self basicAt: r)) ifTrue:[ "neither nor a/d" [found not and: [ l <= r]] whileTrue:[ "Try ascending" m := (l + r) // 2. (found := ((self basicAt: m) = aKey)) ifFalse:[ (sb value: aKey value: (self basicAt: m)) ifTrue: [ r := m - 1] ifFalse:[ l := m + 1] ] ]. found ifFalse:[ l := 1. r := self size]. "Try descending" [found not and: [ l <= r]] whileTrue:[ m := (l + r) // 2. (found := ((self basicAt: m) = aKey)) ifFalse:[ (sb value: aKey value: (self basicAt: m)) ifTrue: [ l := m + 1] ifFalse:[ r := m - 1] ] ] ] ifFalse: ["either ascending or descending" (sb value: (self basicAt: l) value: (self basicAt: r)) ifTrue: [ "ascending order" [found not and: [ l <= r]] whileTrue:[ m := (l + r) // 2. (found := ((self basicAt: m) = aKey)) ifFalse:[ (sb value: aKey value: (self basicAt: m)) ifTrue: [ r := m - 1] ifFalse:[ l := m + 1] ] ] ] ifFalse: [ "descending order" [found not and: [ l <= r]] whileTrue:[ m := (l + r) // 2. (found := ((self basicAt: m) = aKey)) ifFalse:[ (sb value: aKey value: (self basicAt: m)) ifTrue: [ l := m + 1] ifFalse:[ r := m - 1] ] ] ] ]. ^found ---------------------- SNIP --------------------------- Thanks again for your valuable tips. Regards Ingo |
> basicAt: was introduced in OrderedCollection>>includes: for performance,
so > it must be legal to use it in SortedCollection. However, you are assuming that the collection begins at 1. Try this: #(a b c) asSortedCollection removeFirst; includes: #b > The reason why your somewhat tricky, but perfectly legal, example lets fail > my algorithm is, that it is undetermined whether the collection is in > ascending/descending order. So when it "runs" in the "wrong" direction by > accident, it fails. > Well, we catch this possibility and try twice, one for ascending order > and one for descending. O(log n) is not affected by this. OK, try this: | collection | collection := #('aa' 'ab' 'ac' 'ba' 'bb' 'bc') asSortedCollection: [:a :b | a first < b first]. collection allSatisfy: [:each | collection includes: each] This should return true, but with your code, it returns false. Your code only handled my example test case which all elements were sort block equal. It does not handle the general case of having several items that are sort block equal to the item you are searching for. I think you will need to modify your algorithm to go something like: binarySearchFor: anObject (self minimumIndexFor: anObject) to: (self maximumIndexFor: anObject) do: [:i | (self at: i) = anObject ifTrue: [^true]]. ^false Now for the min/maximumIndexFor: methods, you can do your binary search. I'll let you fill in the details for these methods. John Brant |
"John Brant" <[hidden email]> schrieb im Newsbeitrag
news:dbeS6.15071$[hidden email]... > > basicAt: was introduced in OrderedCollection>>includes: for performance, > so > > it must be legal to use it in SortedCollection. > > However, you are assuming that the collection begins at 1. Try this: > #(a b c) asSortedCollection removeFirst; includes: #b > John, I replace l := 1. r := (self size). with l := firstIndex. r := lastIndex. and continue using basicAt:. That's what OrderedCollection>>includes: does. That was my fault. I understand, that our correspondence here comes from a too strict (or sloppy) understanding of the term "sorted" at my side. What you introduce with your fail-cases are partial key ordering problems. Of course are they covered by the totaly free definable SortedCollection SortBlock. I should have defined "sorted" as "strictly sorted", where strictly means no partial keys. What the searchBlock compares, is what we are looking for. Not a part of it. As a consequence we would have to introduce a SortedCollection subclass "StrictlySortedCollection". I don't know whether this is desirable. Of course are you right, that my algorithm fails for this kind of problems. Undoubltly one needs to scan the partitioned collection in a linear manner, to securely find an element in the most general case. I admit frankly, that I didn't regognize the possible existence of partial keys. I believe that falling back a priori to this strategy is not optimal, because by far not all collections have this kind of ordering. What can we lose if trying a binSearch first ? IFF it fails, we can do a second run with that linear search strategy. If we are unlucky, we lose exactly this O(log n) time, what is not much. O(n + log n) worst case, but O(log n) if we are lucky. On the other hand, if using the modified linear method, which does a partioning via binSearch, a priori, we invest too much and get possibly too few for it. O(n) in every case. If it would be my only choice either standard linear search as in includes: , or a modified "pre-binSearch", I wouldn't bother, because both are log(n). Anyway, thanks for the interesting chat. Ingo |
In reply to this post by Ingo Blank
"Ingo Blank" <[hidden email]> wrote in message
news:3b18e45a$0$135$[hidden email]... > Hi Ingo, I concur with your assertions here. The QKS Smalltalk frameworks implemented <SortedCollection> methods for: #includes: #indexOf:ifAbsent: #includesEquivalent: (includes-entry-equivalent-to) #includesIdentical: (includes-entry-identical-to) "" Heart of the mechanism #insertionIndexOf: All using binary search mechanics. Almost all the above methods are implemented in terms of #insertionIndexOf:. The algorithms are carefully analyzed and tuned for Smalltalk performance -- i.e. depending on the collection size a linear scan may be used. As your example benchmarks below suggest, such an approach makes a very significant performance difference. I recall that we were able to provide some 100-fold increase in parts of our toolset just by using a <SortedCollection> instead of an <OrderedCollection>. Note, the algorithm for searching can (and should) be written in terms of the <sortBlock> to avoid various issues you mention. The algorithm provided here is from "SmalltalkAgents - QKS Smalltalk 98". The newer SmallScript system uses a slightly different algorithm. method behavior: SortedCollection [ insertionIndexOf: tObject “Description: Return the index of the slot that would be used to insert the given object. If the list would have to be expanded to insert the value then evaluate the absent handler.” thisContext onSignalDo: [^self basicSize+1]. | width size base | “Preflight values” (* base := 0. “Default is <nil> anyway” *) size := width := self basicSize. “Scan and bisect as long as their is a semispace available” [width] whileTrue: [ | index | index := (width // 2) + 1. "If true then we scan the right semispace" (sortBlock value: (self basicAt: base + index) value: tObject) ifTrue: [ “Adjust the base to select the origin of right-hand semispace” base := base + index. “Converge the width cleverly (i.e., not "width := index"), taking advantage of integer division truncation, to be the width of the right side removing the value that was just tested.” width := width - index ] "If false we scan the left semi-space" ifFalse: [ “Converge the width to that of the left-hand semispace removing the value that was just tested. We leave the base as is since it is already at the start of the left semispace.” width := index - 1 ]. ]. “Return the location where the value would be inserted” ^(base+1) ]. -- Dave S. [http://www.qks.st] > Hi, > > while browsing the Containers sources, I discovered a performance leak. > SortedCollection>>includes: does a _linear_ search on a *sorted* collection. > We all(?) know however, that this can be done in O(log(n)) time complexity > using > a binary search algorithm. > > I'm aware that in Smalltalk the simple algorithm is not applicable, because > the user can > define her/his own SortBlock. > > But while preserving the sort property invariant, (i)(a1 <= a2 <= ...<= an) > respectively > (ii)(an <= ... <= a2 <= a1), for ascending/descending sorted collections, we > can use this modified version, > which determines with a simple test, whether the collection is in > ascending/descending order. > > a1 <= an --> ascending, assuming sort property (i) holds (inductive closure) > an <= a1 --> descending, dto. for (ii) > > Furthermore, a collection is sortable, iff the elements class provides a > "<=" relation. > > So, if our collection holds the sort property invariant *and* our algorithm > uses these operators > (<,= or <=) only, we can use this algorithm. (q.e.d.) > > Of course the sort order property can be broken by a pathological user > defined SortBlock, > but then, we have no *SortedCollection* at all because it simply isn't > sorted. Then and only then, the algorithm fails. > > I think, this is a sufficient formal proof for introducing the > SequenceableCollection>>binarySearchFor: method and safely overwriting > SortedCollection>>includes: > > ------------------ SNIP ------------------ > > !SortedCollection methodsFor! > > includes: target > ^self binarySearchFor: target > > The well known algorithm, modified for ascending/descending collections: > > ------------------ SNIP ------------------ > > !SequenceableCollection methodsFor! > > binarySearchFor: aKey > |l r m found| > "wp: receiver has to be sorted" > > l := 1. > r := self size. > > found := false. > > (self basicAt: l) <= (self basicAt: r) ifTrue: [ "ascending order" > > [found not and: [l <= r]] whileTrue:[ > m := (l + r) // 2. > (found := ((self basicAt: m) = aKey)) ifFalse:[ > aKey < (self basicAt: m) ifTrue: [ r := m - 1] > ifFalse: [ l := m + 1] > ] > ] > ] > > ifFalse: [ "descending order" > > [found not and: [l <= r]] whileTrue:[ > m := (l + r) // 2. > (found := ((self basicAt: m) = aKey)) ifFalse:[ > aKey < (self basicAt: m) ifTrue: [ l := m + 1] > ifFalse: [ r := m - 1] > ] > ] > > ]. > > > ^found > > > --------------- SNIP ---------------- > > I believe the examples dont' have to be commented. > > oc := OrderedCollection new. > 100000 timesRepeat:[oc add: (RandomString new: 10)]. > sc := oc asSortedCollection. > dc := oc asSortedCollection sortBlock: [:sx :sy | sy <= sx]. > > > Time millisecondsToRun:[1000 timesRepeat:[oc includes: oc any]] 4108 > Time millisecondsToRun:[1000 timesRepeat:[sc includes: sc any]] 28 > Time millisecondsToRun:[1000 timesRepeat:[dc includes: dc any]] 28 > > (N.B. Collection>>any picks a random element) > > -------------- SNIP -------------------- > > After all you might think that this is not important, because Smalltalk > much better methods for > "Set representations" than sorted collections. These use hash algorithms, > that are O(1), which is of > course transcender. Well, that's right, but if/when we already *have* a > SortedCollection, then > we definitely *should* use a better search operation than linear. > > > > Bye > Ingo > > > > |
In reply to this post by Ingo Blank
"Ingo Blank" <[hidden email]> wrote in message
news:3b19ad22$0$139$[hidden email]... > I understand, that our correspondence here comes from a too strict (or > sloppy) understanding > of the term "sorted" at my side. What you introduce with your fail-cases are > partial key > ordering problems. Of course are they covered by the totaly free definable > SortedCollection > SortBlock. I should have defined "sorted" as "strictly sorted", where > strictly means no partial > keys. What the searchBlock compares, is what we are looking for. Not a part > of it. > As a consequence we would have to introduce a SortedCollection subclass > "StrictlySortedCollection". > I don't know whether this is desirable. > > Of course are you right, that my algorithm fails for this kind of problems. > Undoubltly one needs to scan the partitioned collection in a linear manner, > to securely find an element in the most general case. I admit frankly, that > I didn't regognize > the possible existence of partial keys. > > I believe that falling back a priori to this strategy is not optimal, > because by far not > all collections have this kind of ordering. What can we lose if trying a > binSearch first ? > IFF it fails, we can do a second run with that linear search strategy. If we > are unlucky, > we lose exactly this O(log n) time, what is not much. O(n + log n) worst > case, but > O(log n) if we are lucky. O(n + log n) is O(n). > On the other hand, if using the modified linear method, which does a > partioning via binSearch, > a priori, we invest too much and get possibly too few for it. O(n) in every > case. I don't understand why we would be investing too much here since it is also bounded by O(n) and o(log n). Anyway, here's a version that finds the left most position, and then does a linear scan from there as long as the elements are sort block equal: binaryIncludes: anObject | includesEqual start end middle | self size = 0 ifTrue: [^false]. includesEqual := sortBlock value: anObject value: anObject. start := firstIndex. end := lastIndex. [end - start > 1] whileTrue: [middle := (start + end) // 2. (includesEqual ifTrue: [sortBlock value: anObject value: (self basicAt: middle)] ifFalse: [(sortBlock value: (self basicAt: middle) value: anObject) not]) ifTrue: [end := middle] ifFalse: [start := middle]]. start ~~ end ifTrue: [anObject = (self basicAt: start) ifTrue: [^true] ifFalse: [start := end]]. [anObject = (self basicAt: start) ifTrue: [^true]. start := start + 1. start <= lastIndex and: [includesEqual ifTrue: [sortBlock value: (self basicAt: start) value: anObject] ifFalse: [(sortBlock value: anObject value: (self basicAt: start)) not]]] whileTrue. ^false I tested this with a 10000 symbol collection, testing the includes for each symbol: 32 seconds for normal Dolphin #includes: 600 ms for #binaryIncludes: when the symbols are sorted by #<= 12 seconds for #binaryIncludes: when the symbols are sorted by [:a :b | a first <= b first] 102 seconds for #binaryIncludes: when the symbols aren't sorted (e.g., [:a :b | false]) BTW, Dolphin can't sort 10000 elements if you have a sortBlock of [:a :b | true]. It has a stack overflow. Anyway, the times depends on your collection type. If your sort block is highly selective, then you should get good performance, otherwise it might be a lot worse. Of course, using a Set is still a lot faster than the best case #binaryIncludes:. Now, here's my final test case that makes it hard to use #binaryIncludes: for normal #includes: 'abc' asSortedCollection includes: 3 My draft of the ANSI standard says that includes: can only return true or false, and cannot raise an error. If you try to evaluate the sort block, you will get an error. John Brant |
John,
> I don't understand why we would be investing too much here since it is also > bounded by O(n) and o(log n). Yes, that was a misunderstanding at my side. > Anyway, here's a version that finds the left > most position, and then does a linear scan from there as long as the > elements are sort block equal: > binaryIncludes: anObject > | includesEqual start end middle | > self size = 0 ifTrue: [^false]. > includesEqual := sortBlock value: anObject value: anObject. > start := firstIndex. > end := lastIndex. > [end - start > 1] whileTrue: > [middle := (start + end) // 2. > (includesEqual > ifTrue: [sortBlock value: anObject value: (self basicAt: > middle)] > ifFalse: [(sortBlock value: (self basicAt: middle) value: > anObject) not]) > ifTrue: [end := middle] > ifFalse: [start := middle]]. > start ~~ end > ifTrue: > [anObject = (self basicAt: start) ifTrue: [^true] ifFalse: > [start := end]]. > > [anObject = (self basicAt: start) ifTrue: [^true]. > start := start + 1. > start <= lastIndex and: > [includesEqual > ifTrue: [sortBlock value: (self basicAt: start) value: > anObject] > ifFalse: [(sortBlock value: anObject value: (self basicAt: > start)) not]]] > whileTrue. > ^false > > I tested this with a 10000 symbol collection, testing the includes for > symbol: > 32 seconds for normal Dolphin #includes: > 600 ms for #binaryIncludes: when the symbols are sorted by #<= > 12 seconds for #binaryIncludes: when the symbols are sorted by [:a :b | > a first <= b first] > 102 seconds for #binaryIncludes: when the symbols aren't sorted (e.g., > [:a :b | false]) Well, I believe that is a very good result. And of course *much* better than the OrderedCollection>>includes:. It has no trade off compared with the native binarySearchFor: and covers the general case. We can't get more. That was exactly my intention, but my first tries were obviously a little bit too naive, because I underestimated the power of the sortBlock... > BTW, Dolphin can't sort 10000 elements if you have a sortBlock of [:a :b | > true]. It has a stack overflow. > > Anyway, the times depends on your collection type. If your sort block is > highly selective, then you should get good performance, otherwise it might > be a lot worse. Of course, using a Set is still a lot faster than the best > case #binaryIncludes:. That's what I said. Smalltalk has much transcender mechanisms for efficient Set representation than SortedCollections. But if/when we have generated one, we should use their properties. > Now, here's my final test case that makes it hard to use #binaryIncludes: > for normal #includes: > 'abc' asSortedCollection includes: 3 > My draft of the ANSI standard says that includes: can only return true or > false, and cannot raise an error. If you try to evaluate the sort block, you > will get an error. As far as I understand this, the exceptions come from comparing incompatible data types. (char with int in your example). Could you wrap the compares in catch blocks, and return false if/when an exception occurs ? > John Brant Bye and thanks Ingo |
In reply to this post by David Simmons
"Paolo Bonzini" <[hidden email]> wrote in message
news:[hidden email]... > > The QKS Smalltalk frameworks implemented <SortedCollection> methods for: > > > > #includes: > > #indexOf:ifAbsent: > > > > "" Heart of the mechanism > > #insertionIndexOf: > > > > All using binary search mechanics. Almost all the above methods are > > implemented in terms of #insertionIndexOf:. > > > > Note, the algorithm for searching can (and should) be written in terms > > of the <sortBlock> to avoid various issues you mention. > > That's plain. But the QKS Smalltalk implementation that you posted is > wrong, because John's counterexamples will probably have it fail miserably > (unless special code in #includes:, that you did not post, works around > the bugs). In particular when I tried this simple test on GNU Smalltalk > it failed > > | a sc | > a := #('aa' 'ac' 'ab' 'bc' 'bb' 'ba' 'cc' 'ca' 'cb'). > sc := a asSortedCollection: [ :x :y | x first <= y first ]. > ^a allSatisfy: [ :each | sc includes: each ] The above does work correctly when I test it on QKS Smalltalk. I'm not sure what you refer to as the #insertionIndexOf: "bugs" that need to be worked around by the "include:/indexOf:ifAbsent:" method(s). The "bugs" that need to be worked around are those which would be in an incorrect sortBlock algorithm like the one above which collates using only the first character and thus effectively leaves the collection "as is". I performed the test with both the stock version which first attempts a binary search for lists greater than 42 entries in size. And then I hacked the code to perform a binary scan for lists greater than 1 entry in size. Both variants worked correctly (because if the binary search fails they [must necessarily] default to a linear scan). I.e., the binary search is an optimization. There is no guarantee that a <SortedList> is actually sorted properly -- that depends entirely on the "correctness" of the sortBlock's algorithm. Which is "not correct" in the above example. "" #collect: ... a map: [:e| sc insertionIndexOf: e]. "" Returns the correct values: {4,4,4,7,7,7,10,10,10} Here's the other pieces of code you wanted: ------------------------------------------ Method behavior: SortedCollection [ indexOf: object ifAbsent: absentHandler "Description: Return the index of an object equal to the given object or evaluate the absentHandler if the object is not found." | size | "First, determine if we have any entries. Then based on that value decide if a binary search is more efficient than a linear scan. The empirical crossover point is around 42 depending on the type of data and the form of testing (ident/equal/equiv)." ((size := self basicSize) ifZero: [^absentHandler value]) > 42 ifTrue: [|index| "Now, find the closest index" index := self insertionIndexOf: object. "Depending on whether the sortBlock includes an equals test we will get the index or the index-1. Furthermore, the sortblock test may not actually compare the objects so this is only an optimization; if it fails we still need to perform a full linear scan." ((self basicAt: (index min: size)) = object) ifTrue: [^index]. ((self basicAt: (index-1 max: 1)) = object) ifTrue: [^index-1]. ]. "Perform a linear scan" 1 to: size do: [:i | ((self basicAt: i) = object) ifTrue: [^i]. ]. "If we failed after the thorough scan evaluate the absentHandler" ^absentHandler value ]. Method behavior: SortedCollection [ includes: object "Description: Return a boolean value indicating whether the receiver includes and object equal to the given element." ^(self indexOf: object ifAbsent: []) asBoolean ]. -- Dave S. [www.smallscript.net] > > I worked around the problem this morning with a little extra code (about > 50 lines); I also had to add exception handling which I guess is what the > send to thisContext does in your code. For performance, I split the > binary search mechanisms in a special method that only uses sort blocks > and is used when the element is intended to be in the collection (#add: > and so forth), and another that does all the linear search crap. > > SortedCollection implementation is far from intuitive if you want to do it > efficiently. As an additional example, consider that you have to provide > equally good mechanisms for indexed accessing (usually you do a > pre-sorting pass for this) and for incremental construction (where you > usually use trees, either balanced or not)! In GNU Smalltalk I coalesce > successive #add: operations, support a heap data structure if they are > only using the SortedCollection as a priority queue, and have two sorting > algorithms (heapsort and quicksort). This data structure gives O(n log n) > behavior unless the client frequently interleaves adding and searching. > > Paolo > > > > -- > Posted from IDENT:[hidden email] [131.175.16.193] > via Mailgate.ORG Server - http://www.Mailgate.ORG |
I should probably conjecture that one could potentially avoid the fallback
to the linear scan by pursuing some line of reasoning regarding a test of the form (during the index scan): (sortBlock value: (self at: a) value: (self at: b) == sortBlock value: (self at: b) value: (self at: a)) ifTrue: [ "" bogus sort-block so do a linear scan ]. Which using a SmallScripts ability to use a more terse syntax can also be written as: if (sortBlock(self[a],self[b]) == sortBlock(self[b],self[a])) [ ]. -- Dave S. [www.smallscript.net] P.S., arghh! I have to train myself to stop reflex typing "their" when I meant to type "there"; or practice being less hasty in my post-proofreading. It is irritating to write the post, send it, then read-it and immediately spot such stupid typos. "David Simmons" <[hidden email]> wrote in message news:Ji9T6.65719$%[hidden email]... > "Paolo Bonzini" <[hidden email]> wrote in message > news:[hidden email]... > > > The QKS Smalltalk frameworks implemented <SortedCollection> methods for: > > > > > > #includes: > > > #indexOf:ifAbsent: > > > > > > "" Heart of the mechanism > > > #insertionIndexOf: > > > > > > All using binary search mechanics. Almost all the above methods are > > > implemented in terms of #insertionIndexOf:. > > > > > > Note, the algorithm for searching can (and should) be written in terms > > > of the <sortBlock> to avoid various issues you mention. > > > > That's plain. But the QKS Smalltalk implementation that you posted is > > wrong, because John's counterexamples will probably have it fail > > (unless special code in #includes:, that you did not post, works around > > the bugs). In particular when I tried this simple test on GNU Smalltalk > > it failed > > > > | a sc | > > a := #('aa' 'ac' 'ab' 'bc' 'bb' 'ba' 'cc' 'ca' 'cb'). > > sc := a asSortedCollection: [ :x :y | x first <= y first ]. > > ^a allSatisfy: [ :each | sc includes: each ] > > The above does work correctly when I test it on QKS Smalltalk. I'm not > what you refer to as the #insertionIndexOf: "bugs" that need to be worked > around by the "include:/indexOf:ifAbsent:" method(s). The "bugs" that need > to be worked around are those which would be in an incorrect sortBlock > algorithm like the one above which collates using only the first character > and thus effectively leaves the collection "as is". > > I performed the test with both the stock version which first attempts a > binary search for lists greater than 42 entries in size. And then I hacked > the code to perform a binary scan for lists greater than 1 entry in size. > > Both variants worked correctly (because if the binary search fails they > [must necessarily] default to a linear scan). > > I.e., the binary search is an optimization. There is no guarantee that a > <SortedList> is actually sorted properly -- that depends entirely on the > "correctness" of the sortBlock's algorithm. Which is "not correct" in the > above example. > > "" #collect: ... > a map: [:e| sc insertionIndexOf: e]. > > "" Returns the correct values: > {4,4,4,7,7,7,10,10,10} > > Here's the other pieces of code you wanted: > ------------------------------------------ > Method behavior: SortedCollection > [ > indexOf: object > ifAbsent: absentHandler > > "Description: Return the index of an object equal to the given object > evaluate > the absentHandler if the object is not found." > > | size | > > "First, determine if we have any entries. Then based on that value > decide if a binary search is more efficient than a linear scan. > The empirical crossover point is around 42 depending on the type > of data and the form of testing (ident/equal/equiv)." > ((size := self basicSize) ifZero: [^absentHandler value]) > 42 ifTrue: > [|index| > "Now, find the closest index" > index := self insertionIndexOf: object. > > "Depending on whether the sortBlock includes an equals test we > will get the index or the index-1. Furthermore, the > sortblock test may not actually compare the objects so this > is only an optimization; if it fails we still need to > perform a full linear scan." > ((self basicAt: (index min: size)) = object) ifTrue: [^index]. > ((self basicAt: (index-1 max: 1)) = object) ifTrue: [^index-1]. > ]. > > "Perform a linear scan" > 1 to: size do: > [:i | > ((self basicAt: i) = object) ifTrue: [^i]. > ]. > > "If we failed after the thorough scan evaluate the absentHandler" > ^absentHandler value > ]. > > Method behavior: SortedCollection > [ > includes: object > > "Description: Return a boolean value indicating whether the receiver > includes and > object equal to the given element." > > ^(self indexOf: object ifAbsent: []) asBoolean > ]. > > -- Dave S. [www.smallscript.net] > > > > > I worked around the problem this morning with a little extra code (about > > 50 lines); I also had to add exception handling which I guess is what > > send to thisContext does in your code. For performance, I split the > > binary search mechanisms in a special method that only uses sort blocks > > and is used when the element is intended to be in the collection (#add: > > and so forth), and another that does all the linear search crap. > > > > SortedCollection implementation is far from intuitive if you want to do it > > efficiently. As an additional example, consider that you have to provide > > equally good mechanisms for indexed accessing (usually you do a > > pre-sorting pass for this) and for incremental construction (where you > > usually use trees, either balanced or not)! In GNU Smalltalk I coalesce > > successive #add: operations, support a heap data structure if they are > > only using the SortedCollection as a priority queue, and have two sorting > > algorithms (heapsort and quicksort). This data structure gives O(n log n) > > behavior unless the client frequently interleaves adding and searching. > > > > Paolo > > > > > > > > -- > > Posted from IDENT:[hidden email] [131.175.16.193] > > via Mailgate.ORG Server - http://www.Mailgate.ORG > > |
In reply to this post by David Simmons
"David Simmons" <[hidden email]> wrote in message
news:Ji9T6.65719$%[hidden email]... > "Paolo Bonzini" <[hidden email]> wrote in message > news:[hidden email]... > > That's plain. But the QKS Smalltalk implementation that you posted is > > wrong, because John's counterexamples will probably have it fail miserably > > (unless special code in #includes:, that you did not post, works around > > the bugs). In particular when I tried this simple test on GNU Smalltalk > > it failed > > > > | a sc | > > a := #('aa' 'ac' 'ab' 'bc' 'bb' 'ba' 'cc' 'ca' 'cb'). > > sc := a asSortedCollection: [ :x :y | x first <= y first ]. > > ^a allSatisfy: [ :each | sc includes: each ] > > The above does work correctly when I test it on QKS Smalltalk. I'm not > what you refer to as the #insertionIndexOf: "bugs" that need to be worked > around by the "include:/indexOf:ifAbsent:" method(s). The "bugs" that need > to be worked around are those which would be in an incorrect sortBlock > algorithm like the one above which collates using only the first character > and thus effectively leaves the collection "as is". Sorting only on the first character is perfectly valid. Here's the properties of a sort block from the ANSI standard draft (revision 1.9, page 219): ----- 1. Given the same 2 parameters, the sort block must answer the same result. 2. The sort block must obey transitivity. For example, if a is before b, and b is before c, then a must be before c. ----- Now, you may be complaining that the sort block is using #<= instead of #<. The standard draft seems to imply sort blocks should not return true for equal elements ("... answers true if the first parameter should be ordered before the second parameter, and false otherwise."). Both VW and Dolphin use a default sort block of #<=. However, they would sort collections with many of the same elements faster if they used #<. For example, in VW if I sort a collection of 10,000 1's using the default sort block takes ~7.5 seconds on my machine, but if I use a #< sort block, it takes ~45 milliseconds. John Brant |
"John Brant" <[hidden email]> wrote in message
news:QIAT6.24663$[hidden email]... > > "David Simmons" <[hidden email]> wrote in message > news:Ji9T6.65719$%[hidden email]... > > "Paolo Bonzini" <[hidden email]> wrote in message > > news:[hidden email]... > > > > That's plain. But the QKS Smalltalk implementation that you posted is > > > wrong, because John's counterexamples will probably have it fail > miserably > > > (unless special code in #includes:, that you did not post, works > > > the bugs). In particular when I tried this simple test on GNU Smalltalk > > > it failed > > > > > > | a sc | > > > a := #('aa' 'ac' 'ab' 'bc' 'bb' 'ba' 'cc' 'ca' 'cb'). > > > sc := a asSortedCollection: [ :x :y | x first <= y first ]. > > > ^a allSatisfy: [ :each | sc includes: each ] > > > > The above does work correctly when I test it on QKS Smalltalk. I'm not > sure > > what you refer to as the #insertionIndexOf: "bugs" that need to be > > around by the "include:/indexOf:ifAbsent:" method(s). The "bugs" that need > > to be worked around are those which would be in an incorrect sortBlock > > algorithm like the one above which collates using only the first character > > and thus effectively leaves the collection "as is". > > Sorting only on the first character is perfectly valid. Here's the > properties of a sort block from the ANSI standard draft (revision 1.9, page > 219): > ----- > 1. Given the same 2 parameters, the sort block must answer the same result. > 2. The sort block must obey transitivity. For example, if a is before b, and > b is before c, then a must be before c. > ----- Hi John, My language was not particularly precise in the previous post. In other words, the <sortBlock> was perfectly valid and there was no ANSI non-conformance. What I was citing as being "wrong" was the basic algorithm did not actually provide a proper "binary-sort" with respect to object "=" comparison used in #include:. As I was suggesting, and as I think Paolo may have implemented, one can do additional tests using the sortBlock during a binary search to ascertain that it is not consistent with the "=" test and therefore fall-back to a linear scan. I.e., 'ab' != 'ac' "" But, comparing via the sortBlock sortBlock('ab','ac') => true "" And the reversed form: sortBlock('ac','ab') => true "" Would suggest that the collation requirements "" for a binary sort will not hold true. P.S., sortBlock(x,y) is just a SmallScript variant for sortBlock value: x value: y. And #~= is aliased to #!=. (so they are identical messages) > > Now, you may be complaining that the sort block is using #<= instead of #<. > The standard draft seems to imply sort blocks should not return true for > equal elements ("... answers true if the first parameter should be ordered > before the second parameter, and false otherwise."). Both VW and Dolphin use > a default sort block of #<=. However, they would sort collections with many > of the same elements faster if they used #<. For example, in VW if I sort a > collection of 10,000 1's using the default sort block takes ~7.5 seconds on > my machine, but if I use a #< sort block, it takes ~45 milliseconds. I must confess that I don't understand why #< is faster than #<=. Our default sortBlock does happen to use #<, but the #<= vs #< methods perform at essentially identical speeds. It has been too long since I thought this through, so I don't remember why I chose #< as opposed to #<=. Given that we use a bisection search algorithm for insertion points, the choice of #< vs #<= only results in a left-semispace bias instead of a right-semispace bias for your example. I haven't actually run the example to compare so I might be missing something? In the case of virtual-oops (SmallIntegers, Characters), the JIT inlines all the comparison forms so they should perform equivalently. In the general case, <IComparable> objects must (minimally) implement a #compare: method which returns (-1,0,1). The <IComparable> interface provides default implementations of #<, #>, etc in terms of the #compare: method. An extended compare interface form may support additional features for equivalence comparison such as case-insensitive comparison, or human-language/region-script sensitive comparison, etc. --- Curious now... Hmmm. wait a minute. Choosing the left-semispace consistently will tend to cause insertion to occur at the start of the collection which is distinctly slower. So, based on our current implementation, it should be much faster to use a #<= comparison. Ok, I did some seat-of-the-pants benchmarks and the difference is very significant. Case SmallScript(STS) VisualWorks ---- ----------- ----------- < 150ms 18ms <= 19ms 2567ms The cases are reversed for Smallscript/QKS-Smalltalk vis-a-vis VisualWorks 5i3. Upon further investigation it depends, as I speculated, on the way in which the insertion is handled. As I speculated the #<= case is causing insertions to take place at the end of the collection. Whereas, the #< case is causing insertions to take place at the start of the collection. In STS it is much cheaper to add to the last indexed slot of the collection because no slots are copied down. Hence we see a time of ~19ms for the test. Whereas adding to the end takes some 150ms because of all the slot copying-up operations. But in VW it is apparently much cheaper to add to the front of the collection (unfortunately the algorithm source is pretty convoluted to follow through its various methods (and there are no comments) so I can't readily see exactly why). I gather that the algorithm is trying to manage slot-resize-growth at the Smalltalk layer using #become: etc. [this feature is built into the AOS Platform VM mutator for all objects; where the resize/growth policy is basic property for all objects which is inherited from its class by default]. So in VW we see that the #< which causes insertions at the start of the collection takes about ~18ms. Whereas when it goes through its convoluted code case to add to the end it takes 2567ms (which is a really long time compared to 150ms for SmallScript). I'm guessing that however it is copying down/up the slots it is fairly slow (probably doing range-checks and slot index calculations etc). See #insert: "anObject" before: "spot". Where <spot> is "1". ---- -- Dave S. [www.smallscript.net] > > > John Brant > > |
In reply to this post by John Brant
John
You wrote in message news:QIAT6.24663$[hidden email]... > ... > The standard draft seems to imply sort blocks should not return true for > equal elements ("... answers true if the first parameter should be ordered > before the second parameter, and false otherwise."). I thought the standard explicitly stated somewhere that the default sort block relationship should be #<, hence the #todo note in SortedCollection>>value:value: (which implements the default sort "block" in Dolphin). Ah yes, pages 227 and 228 in the specs for <SortedCollection factory> #new and #new:, "A sort block is supplied wich guarantees thtat the elements will be sorted in ascending order as specified by the #< message for the elements". >...Both VW and Dolphin use > a default sort block of #<=. However, they would sort collections with many > of the same elements faster if they used #<. For example, in VW if I sort a > collection of 10,000 1's using the default sort block takes ~7.5 seconds on > my machine, but if I use a #< sort block, it takes ~45 milliseconds. Too true, but it is for backwards compatibility having been defined like that in Smalltalk-80. Normally one wouldn't worry about it that much, except that as a result people have often implemented #<= on their classes so that they work with the default sort block. Changing it unilaterally would break a lot of people's stuff I think (though I wish we'd changed it in the early days to avoid perpetuating it). Regards Blair |
"Blair McGlashan" <[hidden email]> wrote in message
news:9fni48$52nmn$[hidden email]... > You wrote in message > news:QIAT6.24663$[hidden email]... > >...Both VW and Dolphin use > > a default sort block of #<=. However, they would sort collections with > many > > of the same elements faster if they used #<. For example, in VW if I sort > a > > collection of 10,000 1's using the default sort block takes ~7.5 seconds > on > > my machine, but if I use a #< sort block, it takes ~45 milliseconds. > > Too true, but it is for backwards compatibility having been defined like > that in Smalltalk-80. Normally one wouldn't worry about it that much, except > that as a result people have often implemented #<= on their classes so that > they work with the default sort block. Changing it unilaterally would break > a lot of people's stuff I think (though I wish we'd changed it in the early > days to avoid perpetuating it). It isn't a hard bug to fix. I just looked at my Dolphin image, and there are only 9 classes that define #<= but do not define #<. I think making people fix their code would not be that hard. In fact, I bet running a script such as: | block | block := [:cls | ((cls includesSelector: #<=) and: [(cls canUnderstand: #<) not]) ifTrue: [ | source | source := cls sourceCodeAt: #<=. cls compile: (source copyReplaceAll: '<=' with: '<') categories: (cls compiledMethodAt: #<=) categories]]. Object withAllSubclasses do: block. Object class withAllSubclasses do: block would fix most classes. John Brant |
In reply to this post by David Simmons
"Paolo Bonzini" <[hidden email]> wrote in message
news:[hidden email]... > >Case SmallScript(STS) VisualWorks > >---- ----------- ----------- > >< 150ms 18ms > ><= 19ms 2567ms > > FWIW, GNU Smalltalk's times are about the same as SmallScript (171 ms with > <, 21 ms with <=) on a Celeron/400 -- don't know what's your machine. I'm a bit skeptical about those numbers. On the surface of it, the 21ms number seems too fast without special casing support. I just spent a couple hours with Blair McGlashan (Object-Arts' Dolphin) comparing the implementations in each of the various Smalltalk dialects. What we concluded is that the VisualAge approach (with deferred [heap] sorting using a bubble-sort algorithm and binary searching) has the best overall execution characteristics of the existing implementations. But the important lessons were: a) There is modest performance value to be gained in deferring the actual sorting. But to really exploit this requires a careful strategy for implementing and choosing a quicksort vs bubble-sort algorithm. If you're not prepared to provide a careful strategy then using a bubble-sort algorithm is distinctly more efficient (than what most current Smalltalk's provide) for both fringe cases and random element cases. b) The overall driver on performance is actually the cost of shuffling elements around in the collection (not the comparison operations). As a result using techniques like #basicNRCSlotAt: (from SmallScript), or using primitives to shift slot elements up or down in a collection makes a dramatic performance difference. All in all, this is counter intuitive to what a C/C++ developer would expect. The reason is that C/C++ doesn't perform dereferencing and range checking within the code portions of a sort/insertion function that moves objects/entries around. In Smalltalk the cost of moving those elements around typically far outweighed the block-value operations. To allow designs that compete with C/C++ requires either a (C/C++) primitive or the using a JIT supported basicNRC... mechanism to allow elimination of the range checking and dereferencing code. The availability of a basicNRC mechanism allows writing all the code in Smalltalk. The use of deferred sorts allows even more efficiency in selecting and implementing merge algorithms. In any case, the machine we did the tests on has a 1.2GHz AMD Duron running Windows 2000 Service Pack 2 [with Netmeeting running on the machine]. We tested the most recently available versions of VisualWorks, VisualAge, Dolphin, SmalltalkMT, Squeak, and SmallScript/SmalltalkAgents. (Blair, I tested squeak after we quit NetMeeting). -- SIDEBAR -- FYI for other testers, squeak really eats idle cycles and typically skews benchmarks in other applications by 30%. So don't try benchmarks with squeak running; the reverse is not true. I.e., having other Smalltalks running when you benchmark squeak has nominal impact. -- END SIDEBAR -- The JIT based Smalltalks were distinctly faster in their fast cases, the Smalltalks with primitives for shuffling slot-elements were much faster than those using equivalent "pure classic smalltalk" algorithms without such facilities [whether they were jitted or not]. Assuming all things being equal (which is not really true), the GNU Smalltalk numbers you gave for a 400MHz machine would roughly scale to be 7ms and 57ms on our test machine. I suspect that achieving those (7ms) numbers requires special case code and C/C++ code (unless one has a JIT supporting basicNRC...). So my question to you is: Do you have special case code in for the degenerate cases? Do you have a JIT? Are there any primitives involved in the collection element insertion or sorting code? OVERALL LESSON: Design of the algorithms is very important especially in relation to the characteristics of a given Smalltalk implementation. P.S., perhaps the most interesting side-thing about our efforts was that we found wild differences in the browser facilities and the ease of use factors among the Smalltalks. The same sequences in one system did not translate to another and in many cases various natural/intuitive features were missing between various Smalltalks. I.e., browsing skills in one smalltalk IDE did not directly translate to other smalltalk IDEs. We understood the features we wanted and how to navigate, but actually finding [if they were even present] the right elements (menus, keystrokes, tools) and figuring out how to use them properly was not directly transferable from one system to another. Cheers, -- Dave S. [www.smallscript.net] > > Now I cannot really give any meaning to this, since the algorithms that > GNU Smalltalk employ are quite sophisticated and based on heaps. What's > more, that building the heap takes 50% of the time <=, and 95% with <. > Now I must go and do some profiling! > > Paolo > > > -- > Posted from IDENT:[hidden email] [131.175.16.193] > via Mailgate.ORG Server - http://www.Mailgate.ORG |
"David Simmons" <[hidden email]> wrote in message
news:30dU6.74274$%[hidden email]... > "Paolo Bonzini" <[hidden email]> wrote in message > news:[hidden email]... > > >Case SmallScript(STS) VisualWorks > > >---- ----------- ----------- > > >< 150ms 18ms > > ><= 19ms 2567ms > > > > FWIW, GNU Smalltalk's times are about the same as SmallScript (171 ms with > > <, 21 ms with <=) on a Celeron/400 -- don't know what's your machine. > > I'm a bit skeptical about those numbers. On the surface of it, the 21ms > number seems too fast without special casing support. > > I just spent a couple hours with Blair McGlashan (Object-Arts' Dolphin) > comparing the implementations in each of the various Smalltalk dialects. > What we concluded is that the VisualAge approach (with deferred [heap] > sorting using a bubble-sort algorithm and binary searching) has the best > overall execution characteristics of the existing implementations. Actually, VA doesn't have any bubble-sort algorithm in it. It is a heap sort. The *bubble* methods are maintaining the heap by moving objects up/down in the heap. VA also uses a merge sort when performing large #addAll:'s. Since I've always heard that quicksort is faster than heapsort for almost all real world problems, I decided to see why VA is faster. First I tried: Time millisecondsToRun: [(10000 to: 1 by: -1) asSortedCollection] Time millisecondsToRun: [(1 to: 10000) asSortedCollection] Sure enough VA was faster (VA would either return 0 or 10 ms to run the first one -- I don't think VA has a good millisecond clock; it seems that most numbers returned are multiples of 10). Now, since I knew that the people that told me about quicksort were smart people, I thought that there must be something wrong with my tests. After thinking about it for awhile, I realized that VA wasn't really sorting the collection, it was just making a heap. So I modified my tests by sending #first to the sorted collection. This causes the collection to be sorted if it isn't already. After adding the #first message, here are the times (ms) that I got: Collection VA VW 10000-1 70, 70, 60 28, 28, 29 1-10000 100, 90, 100 27, 28, 33 random (1-10000) 80, 171, 150 46, 43, 52 The random test used the same random literal array of integers on both sides. This fits my overall impression of VA vs. VW: when I run simple benchmarks, VA seems faster, but when I run real code, VW seems faster. One interesting test that I did not perform is the addAll: method. Since VA uses a merge sort and VW resorts the whole list, it would be interesting to see the performance of each. > b) The overall driver on performance is actually the cost of shuffling > elements around in the collection (not the comparison operations). As a > result using techniques like #basicNRCSlotAt: (from SmallScript), or using > primitives to shift slot elements up or down in a collection makes a > dramatic performance difference. This is the problem with heapsort. Although it is O(n log n), it does too much shuffling elements up/down the heap. > P.S., perhaps the most interesting side-thing about our efforts was that we > found wild differences in the browser facilities and the ease of use factors > among the Smalltalks. The same sequences in one system did not translate to > another and in many cases various natural/intuitive features were missing > between various Smalltalks. > > I.e., browsing skills in one smalltalk IDE did not directly translate to > other smalltalk IDEs. We understood the features we wanted and how to > navigate, but actually finding [if they were even present] the right > elements (menus, keystrokes, tools) and figuring out how to use them > properly was not directly transferable from one system to another. That's why I made the Refactoring Browser be the same in VA as it was in VW. I wanted things to work the same between the two. Of course people who know VA don't like this, but the people with a VW background have no problems with it... John Brant |
"John Brant" <[hidden email]> wrote in message
news:UDfU6.32299$[hidden email]... > "David Simmons" <[hidden email]> wrote in message > news:30dU6.74274$%[hidden email]... > > "Paolo Bonzini" <[hidden email]> wrote in message > > news:[hidden email]... > > > >Case SmallScript(STS) VisualWorks > > > >---- ----------- ----------- > > > >< 150ms 18ms > > > ><= 19ms 2567ms > > > > > > FWIW, GNU Smalltalk's times are about the same as SmallScript (171 ms > with > > > <, 21 ms with <=) on a Celeron/400 -- don't know what's your machine. > > > > I'm a bit skeptical about those numbers. On the surface of it, the 21ms > > number seems too fast without special casing support. > > > > I just spent a couple hours with Blair McGlashan (Object-Arts' Dolphin) > > comparing the implementations in each of the various Smalltalk dialects. > > What we concluded is that the VisualAge approach (with deferred [heap] > > sorting using a bubble-sort algorithm and binary searching) has the best > > overall execution characteristics of the existing implementations. > > Actually, VA doesn't have any bubble-sort algorithm in it. It is a heap > sort. The *bubble* methods are maintaining the heap by moving objects > up/down in the heap. VA also uses a merge sort when performing large > #addAll:'s. > > Since I've always heard that quicksort is faster than heapsort for almost > all real world problems, I decided to see why VA is faster. First I tried: > Time millisecondsToRun: [(10000 to: 1 by: -1) asSortedCollection] > Time millisecondsToRun: [(1 to: 10000) asSortedCollection] > Sure enough VA was faster (VA would either return 0 or 10 ms to run the > first one -- I don't think VA has a good millisecond clock; it seems that > most numbers returned are multiples of 10). You've got to watch it. Those numbers you're seeing are *not* correct. VA uses lazy sorting so the time returned is not correct until you access an element (and thus force a sort). And, as you observed, the millisecond timer values it returns are suspect since they are multiples of 10 which does not correlate with making OS calls to the various time/timer services (or directly using x86 cycle timers). Try the following in both VA and VW: "" Must access an element to force sort |c| := SortedCollection sortBlock: [:a :b| a < b]. Time millisecondsToRun: [ 10000 timesRepeat: [c add: 1]. c at: 1. ]. "" Must access an element to force sort |c| := SortedCollection sortBlock: [:a :b| a <= b]. Time millisecondsToRun: [ 10000 timesRepeat: [c add: 1]. c at: 1. ]. "" Must access an element to force sort |c| := SortedCollection sortBlock: [:a :b| a < b]. Time millisecondsToRun: [ 10000 timesRepeat: [c add: 1; at: 1] ]. "" Must access an element to force sort |c| := SortedCollection sortBlock: [:a :b| a <= b]. Time millisecondsToRun: [ 10000 timesRepeat: [c add: 1; at: 1] ]. ==== | values | values := [(10000 to: 1 by: -1). "" Must access an element to force sort Time millisecondsToRun: [values asSortedCollection at: 1]. "" Must access an element to force sort |c| := SortedCollection sortBlock: [:a :b| a < b]. Time millisecondsToRun: [ c addAll: values; at: 1. ]. "" Must access an element to force sort |c| := SortedCollection sortBlock: [:a :b| a <= b]. Time millisecondsToRun: [ c addAll: values; at: 1. ]. ==== Try it again with <values> full of random numbers or random strings, characters. > > Now, since I knew that the people that told me about quicksort were smart > people, I thought that there must be something wrong with my tests. After > thinking about it for awhile, I realized that VA wasn't really sorting the > collection, it was just making a heap. So I modified my tests by sending > #first to the sorted collection. This causes the collection to be sorted if > it isn't already. After adding the #first message, here are the times (ms) > that I got: > Collection VA VW > 10000-1 70, 70, 60 28, 28, 29 > 1-10000 100, 90, 100 27, 28, 33 > random (1-10000) 80, 171, 150 46, 43, 52 > The random test used the same random literal array of integers on both > sides. You're not seeing the VW degenerate case. The case is very clear when you compare a #< to a #<= sortBlock where all the elements in the array are 1. But it also shows up quite clearly in a Random collection of integers when comparing #< to #<=. > > This fits my overall impression of VA vs. VW: when I run simple benchmarks, > VA seems faster, but when I run real code, VW seems faster. My general experience with micro-benchmarks (which tend to be impervious to vendor's higher level algorithm issues), is that VW is about +50% faster than VA for many/most things. Under those same micro-benchmarks, SmallScript seems to typically be about the same speed or sometimes a little faster than VW. > > One interesting test that I did not perform is the addAll: method. Since VA > uses a merge sort and VW resorts the whole list, it would be interesting to > see the performance of each. We did that. > > > b) The overall driver on performance is actually the cost of shuffling > > elements around in the collection (not the comparison operations). As a > > result using techniques like #basicNRCSlotAt: (from SmallScript), or using > > primitives to shift slot elements up or down in a collection makes a > > dramatic performance difference. > > This is the problem with heapsort. Although it is O(n log n), it does too > much shuffling elements up/down the heap. No. The heap/bubble sort algorithm in VA performed quite close to VW in terms of the number of calls it made to the sortBlock. We benchmarked the number of sortBlock valuations in all cases. Actually, heap/bubble sorting in VA performed distinctly better than VW on average. QuickSort does not perform well when its elements are largely sorted, a good design would account for that by not-resorting the portion of the collection which was already sorted [something that none of the implementations does]. I.e., none of them maintain both a sorted subset, and a set of "to be lazily added" elements. It didn't matter whether we added elements one at a time via #add: or a bunch of elements via #addAll:. We used <Random> sets of numbers and sets of all 1's and all 'a' (strings). More interesting is that squeak has the what looked to be the *same* code as VW for SortedCollection, except that squeak has a primitive for copying indexed slot elements. So where VW took some ~2800ms, Squeak would take ~350ms. -- Dave S. [www.smallscript.net] > > > P.S., perhaps the most interesting side-thing about our efforts was that > we > > found wild differences in the browser facilities and the ease of use > factors > > among the Smalltalks. The same sequences in one system did not translate > to > > another and in many cases various natural/intuitive features were missing > > between various Smalltalks. > > > > I.e., browsing skills in one smalltalk IDE did not directly translate to > > other smalltalk IDEs. We understood the features we wanted and how to > > navigate, but actually finding [if they were even present] the right > > elements (menus, keystrokes, tools) and figuring out how to use them > > properly was not directly transferable from one system to another. > > That's why I made the Refactoring Browser be the same in VA as it was in VW. > I wanted things to work the same between the two. Of course people who know > VA don't like this, but the people with a VW background have no problems > with it... > > > John Brant > > |
"David Simmons" <[hidden email]> wrote in message
news:lxgU6.75498$%[hidden email]... > "John Brant" <[hidden email]> wrote in message > > Now, since I knew that the people that told me about quicksort were smart > > people, I thought that there must be something wrong with my tests. After > > thinking about it for awhile, I realized that VA wasn't really sorting the > > collection, it was just making a heap. So I modified my tests by sending > > #first to the sorted collection. This causes the collection to be sorted > if > > it isn't already. After adding the #first message, here are the times (ms) > > that I got: > > Collection VA VW > > 10000-1 70, 70, 60 28, 28, 29 > > 1-10000 100, 90, 100 27, 28, 33 > > random (1-10000) 80, 171, 150 46, 43, 52 > > The random test used the same random literal array of integers on both > > sides. > > You're not seeing the VW degenerate case. The case is very clear when you > compare a #< to a #<= sortBlock where all the elements in the array are 1. > But it also shows up quite clearly in a Random collection of integers when > comparing #< to #<=. I decided to look at it a little more. The degenerate case for VW ends up recursively sorting an empty partition and a partition that is one less than the current partition size. If you add this line of code to VW before it recursively sorts its components, you can make the degenerate case the optimal case: [k < j and: [sortBlock value: (self basicAt: k) value: dij]] whileTrue: [k := k + 1]. This is bumping up the right partition to find an element that is not sort block equal to the median element. This doesn't seem to have any noticable effect on normal sorting of numbers. It might cause the sort block to be evaluated a few more times than necessary, so if you have a slow sort block and all elements are distinct, then the other way would be faster. Another optimization you could make is to not recurse into the larger partition and handling it with a while loop instead. This will effective bound the number of recursive sort message to log n. Of course Smalltalk implementations are highly optimized for message sends, so this isn't a big win. However, the current recursive implementations can overflow the stack on some Smalltalk implementations. > > > b) The overall driver on performance is actually the cost of shuffling > > > elements around in the collection (not the comparison operations). As a > > > result using techniques like #basicNRCSlotAt: (from SmallScript), or > using > > > primitives to shift slot elements up or down in a collection makes a > > > dramatic performance difference. > > > > This is the problem with heapsort. Although it is O(n log n), it does too > > much shuffling elements up/down the heap. > > No. The heap/bubble sort algorithm in VA performed quite close to VW in > terms of the number of calls it made to the sortBlock. We benchmarked the > number of sortBlock valuations in all cases. The number of evals of the sortBlock isn't necessarily related to the number of shuffles. For example, the number of at:put:'s in VA's heap sort of (1 to: 10000) is 237263, but for VW's quicksort it is 0 (swap:with: isn't called). Whereas the number of sortBlock evals for VA is 113631 and for VW 125438. Also, bubble sort is completely different than heap sort. Bubble sort is a O(n^2) algorithm. Here's the basic bubble sort: | lastSwap temp | lastSwap := lastIndex. [lastSwap > firstIndex] whileTrue: [temp := firstIndex. firstIndex to: lastSwap - 1 do: [:i | (sortBlock value: (self basicAt: i + 1) value: (self basicAt: i)) ifTrue: [self swap: i with: i + 1. temp := i]]. lastSwap := temp] I found this web page that has an animation of bubble, heap, and quick sorts: http://www.mcs.csuhayward.edu/~simon/handouts/sort/SortDemo.html John Brant |
Free forum by Nabble | Edit this page |