Smalltalk › Usenets › Dolphin Smalltalk

SortedCollection>>includes:

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

23 messages Options

Ingo Blank

SortedCollection>>includes:

Hi,

while browsing the Containers sources, I discovered a performance leak.
SortedCollection>>includes: does a _linear_ search on a *sorted* collection.
We all(?) know however, that this can be done in O(log(n)) time complexity
using
a binary search algorithm.

I'm aware that in Smalltalk the simple algorithm is not applicable, because
the user can
define her/his own SortBlock.

But while preserving the sort property invariant, (i)(a1 <= a2 <= ...<= an)
respectively
(ii)(an <= ... <= a2 <= a1), for ascending/descending sorted collections, we
can use this modified version,
which determines with a simple test, whether the collection is in
ascending/descending order.

a1 <= an --> ascending, assuming sort property (i) holds (inductive closure)
an <= a1 --> descending, dto. for (ii)

Furthermore, a collection is sortable, iff the elements class provides a
"<=" relation.

So, if our collection holds the sort property invariant *and* our algorithm
uses these operators
(<,= or <=) only, we can use this algorithm. (q.e.d.)

Of course the sort order property can be broken by a pathological user
defined SortBlock,
but then, we have no *SortedCollection* at all because it simply isn't
sorted. Then and only then, the algorithm fails.

I think, this is a sufficient formal proof for introducing the
SequenceableCollection>>binarySearchFor: method and safely overwriting
SortedCollection>>includes:

------------------ SNIP ------------------

!SortedCollection methodsFor!

includes: target
^self binarySearchFor: target

The well known algorithm, modified for ascending/descending collections:

------------------ SNIP ------------------

!SequenceableCollection methodsFor!

binarySearchFor: aKey
|l r m found|
"wp: receiver has to be sorted"

l := 1.
r := self size.

found := false.

(self basicAt: l) <= (self basicAt: r) ifTrue: [ "ascending order"

[found not and: [l <= r]] whileTrue:[
m := (l + r) // 2.
(found := ((self basicAt: m) = aKey)) ifFalse:[
aKey < (self basicAt: m) ifTrue: [ r := m - 1]
ifFalse: [ l := m + 1]
]
]
]

ifFalse: [ "descending order"

[found not and: [l <= r]] whileTrue:[
m := (l + r) // 2.
(found := ((self basicAt: m) = aKey)) ifFalse:[
aKey < (self basicAt: m) ifTrue: [ l := m + 1]
ifFalse: [ r := m - 1]
]
]

].

^found

--------------- SNIP ----------------

I believe the examples dont' have to be commented.

oc := OrderedCollection new.
100000 timesRepeat:[oc add: (RandomString new: 10)].
sc := oc asSortedCollection.
dc := oc asSortedCollection sortBlock: [:sx :sy | sy <= sx].

Time millisecondsToRun:[1000 timesRepeat:[oc includes: oc any]] 4108
Time millisecondsToRun:[1000 timesRepeat:[sc includes: sc any]] 28
Time millisecondsToRun:[1000 timesRepeat:[dc includes: dc any]] 28

(N.B. Collection>>any picks a random element)

-------------- SNIP --------------------

After all you might think that this is not important, because Smalltalk has
much better methods for
"Set representations" than sorted collections. These use hash algorithms,
that are O(1), which is of
course transcender. Well, that's right, but if/when we already *have* a
SortedCollection, then
we definitely *should* use a better search operation than linear.

Bye
Ingo

John Brant

Re: SortedCollection>>includes:

>...
>
> I believe the examples dont' have to be commented.
>
> oc := OrderedCollection new.
> 100000 timesRepeat:[oc add: (RandomString new: 10)].
> sc := oc asSortedCollection.
> dc := oc asSortedCollection sortBlock: [:sx :sy | sy <= sx].
>
>
> Time millisecondsToRun:[1000 timesRepeat:[oc includes: oc any]] 4108
> Time millisecondsToRun:[1000 timesRepeat:[sc includes: sc any]] 28
> Time millisecondsToRun:[1000 timesRepeat:[dc includes: dc any]] 28

OK, but I think you need a different algorithm:

| selectors |
selectors := Dictionary new.
Object withAllSubclassesDo: [:each |
each selectors do: [:sel |
selectors at: sel put: (selectors at: sel ifAbsent: [0]) + 1]].
(selectors associations asSortedCollection: [:a :b | a value < b value])
includes: (selectors associationAt: #printOn:)

This should return true (and does under Dolphin's implementation), but
returns false under yours.

John Brant

Ingo Blank

Re: SortedCollection>>includes:

> OK, but I think you need a different algorithm:
>
> | selectors |
> selectors := Dictionary new.
> Object withAllSubclassesDo: [:each |
> each selectors do: [:sel |
> selectors at: sel put: (selectors at: sel ifAbsent: [0]) + 1]].
> (selectors associations asSortedCollection: [:a :b | a value < b value])
> includes: (selectors associationAt: #printOn:)
>
>
> This should return true (and does under Dolphin's implementation), but
> returns false under yours.
>
>
> John Brant
>

John,

thank you for this.

I was obviously fixed by the formal proof, so I neglected the concrete
implementation. My solution works with simple types only, not with compound
structures.

This can be easily fixed.

It introduces however a restriction, that binarySearchFor: has to be placed
into SortedCollection, not SequenceableCollection, because it makes use of
its sortBlock property.

Furthermore this new algorithm assumes that the sort block is expressed in
terms of "<" (or "<="). The original assumed, that the elements class
implemented "<" . I don't think that this is a problem in praxi.

Finally you can decide whether to overide SortedCollection>>includes: or
using binarySearchFor: ad hoc.

Thanks again
Regards

Ingo

---------------- SNIP ---------------------
!SortedCollection methodsFor!

binarySearchFor: aKey

|l r m found sb|

"wp: receiver has to be sorted,

sortBlock is expressed in terms of <= (or <)"

found := false.

l := 1.

r := self size.

sb := self sortBlock.

(sb value: (self basicAt: l) value: (self basicAt: r)) ifTrue: [ "ascending
order"

[found not and: [ l <= r]] whileTrue:[

m := (l + r) // 2.

(found := ((self basicAt: m) = aKey)) ifFalse:[

(sb value: aKey value: (self basicAt: m))

ifTrue: [ r := m - 1]

ifFalse:[ l := m + 1]

]

]

]

ifFalse: [ "descending order"

[found not and: [ l <= r]] whileTrue:[

m := (l + r) // 2.

(found := ((self basicAt: m) = aKey)) ifFalse:[

(sb value: aKey value: (self basicAt: m))

ifTrue: [ l := m + 1]

ifFalse:[ r := m - 1]

]

]

].

^found

-------------------- SNIP ------------------------

selectors := Dictionary new.

Object withAllSubclassesDo: [:each |

each selectors do: [:sel |

selectors at: sel put: (selectors at: sel ifAbsent: [0]) + 1]].

(selectors associations asSortedCollection: [:a :b | a value < b value])

includes: (selectors associationAt: #printOn:)

true

John Brant

Re: SortedCollection>>includes:

"Ingo Blank" <[hidden email]> wrote in message
news:3b1920fa$0$133$[hidden email]...

>
>
> > OK, but I think you need a different algorithm:
> >
> > | selectors |
> > selectors := Dictionary new.
> > Object withAllSubclassesDo: [:each |
> > each selectors do: [:sel |
> > selectors at: sel put: (selectors at: sel ifAbsent: [0]) + 1]].
> > (selectors associations asSortedCollection: [:a :b | a value < b value])
> > includes: (selectors associationAt: #printOn:)
> >
> >
> > This should return true (and does under Dolphin's implementation), but
> > returns false under yours.
>
> I was obviously fixed by the formal proof, so I neglected the concrete
> implementation. My solution works with simple types only, not with

compound
> structures.
>
> This can be easily fixed.
>
> It introduces however a restriction, that binarySearchFor: has to be
placed
> into SortedCollection, not SequenceableCollection, because it makes use of
> its sortBlock property.
>
> Furthermore this new algorithm assumes that the sort block is expressed in
> terms of "<" (or "<="). The original assumed, that the elements class
> implemented "<" . I don't think that this is a problem in praxi.

Not quite there yet:

(#('aaa' 'abb' 'acc') asSortedCollection: [:a :b | a first < b first])
includes: 'aaa'

Returns false (at least in my image -- the sorted collection is 'abb',
'acc', 'aaa'). I think the best you can do is find the left and right most
elements that is sort block equal to the object your are searching for, and
then do a linear search in this range. You can test if something is sort
block equal by evaluating the sort block both ways. For example, this should
return true if anObject is sort block equal to anotherObject:
(sortBlock value: anObject value: anotherObject) = (sortBlock value:
anotherObject value: anObject)

Also, you should use at: instead of basicAt: -- there may be empty space at
the beginning of the list.

John Brant

Ingo Blank

Re: SortedCollection>>includes:

"John Brant" <[hidden email]> schrieb im Newsbeitrag
news:dDaS6.13784$[hidden email]...

> "Ingo Blank" <[hidden email]> wrote in message
> news:3b1920fa$0$133$[hidden email]...
> >
> >
> > > OK, but I think you need a different algorithm:
> > >
> > > | selectors |
> > > selectors := Dictionary new.
> > > Object withAllSubclassesDo: [:each |
> > > each selectors do: [:sel |
> > > selectors at: sel put: (selectors at: sel ifAbsent: [0]) +

1]].
> > > (selectors associations asSortedCollection: [:a :b | a value < b
value])

> > > includes: (selectors associationAt: #printOn:)
> > >
> > >
> > > This should return true (and does under Dolphin's implementation), but
> > > returns false under yours.
> >
> > I was obviously fixed by the formal proof, so I neglected the concrete
> > implementation. My solution works with simple types only, not with
> compound
> > structures.
> >
> > This can be easily fixed.
> >
> > It introduces however a restriction, that binarySearchFor: has to be
> placed
> > into SortedCollection, not SequenceableCollection, because it makes use

of
> > its sortBlock property.
> >
> > Furthermore this new algorithm assumes that the sort block is expressed
in

> > terms of "<" (or "<="). The original assumed, that the elements class
> > implemented "<" . I don't think that this is a problem in praxi.
>
> Not quite there yet:
>
> (#('aaa' 'abb' 'acc') asSortedCollection: [:a :b | a first < b first])
> includes: 'aaa'
>
> Returns false (at least in my image -- the sorted collection is 'abb',
> 'acc', 'aaa'). I think the best you can do is find the left and right most
> elements that is sort block equal to the object your are searching for,

and
> then do a linear search in this range. You can test if something is sort
> block equal by evaluating the sort block both ways. For example, this
should
> return true if anObject is sort block equal to anotherObject:
> (sortBlock value: anObject value: anotherObject) = (sortBlock value:
> anotherObject value: anObject)
>
> Also, you should use at: instead of basicAt: -- there may be empty space
at
> the beginning of the list.
>
>
> John Brant
>
John,

basicAt: was introduced in OrderedCollection>>includes: for performance, so
it must be legal to use it in SortedCollection.

The reason why your somewhat tricky, but perfectly legal, example lets fail
my algorithm is, that it is undetermined whether the collection is in
ascending/descending order. So when it "runs" in the "wrong" direction by
accident, it fails.
Well, we catch this possibility and try twice, one for ascending order
and one for descending. O(log n) is not affected by this.

Your tip with sortBlockEqual was very valuable ;-)

(#('aaa' 'abb' 'acc') asSortedCollection: [:a :b | a first < b first])
includes: 'aaa'

--> true

------------------ SNIP ----------------------------

SortedCollection>>sortBlockEqual: a and: b

sortBlockEqual: a and: b

|sb|

sb := self sortBlock.

^(sb value: a value: b) = (sb value: b value: a)

-----------------------------------------------------------------------

SortedCollection>>binarySearchFor: aKey

binarySearchFor: aKey

|l r m found sb|

"wp: receiver has to be sorted,

sortBlock is expressed in terms of <= (or <)"

found := false.

l := 1.

r := self size.

sb := self sortBlock.

(self sortBlockEqual: (self basicAt: l) and: (self basicAt: r)) ifTrue:[
"neither nor a/d"

[found not and: [ l <= r]] whileTrue:[ "Try ascending"

m := (l + r) // 2.

(found := ((self basicAt: m) = aKey)) ifFalse:[

(sb value: aKey value: (self basicAt: m))

ifTrue: [ r := m - 1]

ifFalse:[ l := m + 1]

]

].

found ifFalse:[

l := 1.

r := self size].

"Try descending"

[found not and: [ l <= r]] whileTrue:[

m := (l + r) // 2.

(found := ((self basicAt: m) = aKey)) ifFalse:[

(sb value: aKey value: (self basicAt: m))

ifTrue: [ l := m + 1]

ifFalse:[ r := m - 1]

]

]

]

ifFalse:

["either ascending or descending"

(sb value: (self basicAt: l) value: (self basicAt: r)) ifTrue: [ "ascending
order"

[found not and: [ l <= r]] whileTrue:[

m := (l + r) // 2.

(found := ((self basicAt: m) = aKey)) ifFalse:[

(sb value: aKey value: (self basicAt: m))

ifTrue: [ r := m - 1]

ifFalse:[ l := m + 1]

]

]

]

ifFalse: [ "descending order"

[found not and: [ l <= r]] whileTrue:[

m := (l + r) // 2.

(found := ((self basicAt: m) = aKey)) ifFalse:[

(sb value: aKey value: (self basicAt: m))

ifTrue: [ l := m + 1]

ifFalse:[ r := m - 1]

]

]

]

].

^found

---------------------- SNIP ---------------------------

Thanks again for your valuable tips.

Regards

Ingo

John Brant

Re: SortedCollection>>includes:

> basicAt: was introduced in OrderedCollection>>includes: for performance,
so
> it must be legal to use it in SortedCollection.

However, you are assuming that the collection begins at 1. Try this:
#(a b c) asSortedCollection removeFirst; includes: #b

> The reason why your somewhat tricky, but perfectly legal, example lets
fail
> my algorithm is, that it is undetermined whether the collection is in
> ascending/descending order. So when it "runs" in the "wrong" direction by
> accident, it fails.
> Well, we catch this possibility and try twice, one for ascending order
> and one for descending. O(log n) is not affected by this.

OK, try this:
| collection |
collection := #('aa' 'ab' 'ac' 'ba' 'bb' 'bc') asSortedCollection: [:a
:b | a first < b first].
collection allSatisfy: [:each | collection includes: each]
This should return true, but with your code, it returns false. Your code
only handled my example test case which all elements were sort block equal.
It does not handle the general case of having several items that are sort
block equal to the item you are searching for. I think you will need to
modify your algorithm to go something like:
binarySearchFor: anObject
(self minimumIndexFor: anObject) to: (self maximumIndexFor:
anObject) do: [:i | (self at: i) = anObject ifTrue: [^true]].
^false
Now for the min/maximumIndexFor: methods, you can do your binary search.
I'll let you fill in the details for these methods.

John Brant

Ingo Blank

Re: SortedCollection>>includes:

"John Brant" <[hidden email]> schrieb im Newsbeitrag
news:dbeS6.15071$[hidden email]...
> > basicAt: was introduced in OrderedCollection>>includes: for performance,
> so
> > it must be legal to use it in SortedCollection.
>
> However, you are assuming that the collection begins at 1. Try this:
> #(a b c) asSortedCollection removeFirst; includes: #b
>

John,

I replace

l := 1.
r := (self size).

with

l := firstIndex.
r := lastIndex.

and continue using basicAt:.
That's what OrderedCollection>>includes: does. That was my fault.

I understand, that our correspondence here comes from a too strict (or
sloppy) understanding
of the term "sorted" at my side. What you introduce with your fail-cases are
partial key
ordering problems. Of course are they covered by the totaly free definable
SortedCollection
SortBlock. I should have defined "sorted" as "strictly sorted", where
strictly means no partial
keys. What the searchBlock compares, is what we are looking for. Not a part
of it.
As a consequence we would have to introduce a SortedCollection subclass
"StrictlySortedCollection".
I don't know whether this is desirable.

Of course are you right, that my algorithm fails for this kind of problems.
Undoubltly one needs to scan the partitioned collection in a linear manner,
to securely find an element in the most general case. I admit frankly, that
I didn't regognize
the possible existence of partial keys.

I believe that falling back a priori to this strategy is not optimal,
because by far not
all collections have this kind of ordering. What can we lose if trying a
binSearch first ?
IFF it fails, we can do a second run with that linear search strategy. If we
are unlucky,
we lose exactly this O(log n) time, what is not much. O(n + log n) worst
case, but
O(log n) if we are lucky.

On the other hand, if using the modified linear method, which does a
partioning via binSearch,
a priori, we invest too much and get possibly too few for it. O(n) in every
case.

If it would be my only choice either standard linear search as in includes:
, or a modified
"pre-binSearch", I wouldn't bother, because both are log(n).

Anyway, thanks for the interesting chat.

Ingo

David Simmons

Re: SortedCollection>>includes:

In reply to this post by Ingo Blank

"Ingo Blank" <[hidden email]> wrote in message
news:3b18e45a$0$135$[hidden email]...
>

Hi Ingo,

I concur with your assertions here. The QKS Smalltalk frameworks implemented
<SortedCollection> methods for:

#includes:
#indexOf:ifAbsent:
#includesEquivalent: (includes-entry-equivalent-to)
#includesIdentical: (includes-entry-identical-to)

"" Heart of the mechanism
#insertionIndexOf:

All using binary search mechanics. Almost all the above methods are
implemented in terms of #insertionIndexOf:.

The algorithms are carefully analyzed and tuned for Smalltalk performance --
i.e. depending on the collection size a linear scan may be used. As your
example benchmarks below suggest, such an approach makes a very significant
performance difference.

I recall that we were able to provide some 100-fold increase in parts of our
toolset just by using a <SortedCollection> instead of an
<OrderedCollection>.

Note, the algorithm for searching can (and should) be written in terms of
the <sortBlock> to avoid various issues you mention.

The algorithm provided here is from "SmalltalkAgents - QKS Smalltalk 98".
The newer SmallScript system uses a slightly different algorithm.

method behavior: SortedCollection
[
insertionIndexOf: tObject

Description: Return the index of the slot that would be used to insert
the given
object. If the list would have to be expanded to insert the value then
evaluate the
absent handler.

thisContext onSignalDo: [^self basicSize+1].

| width size base |

Preflight values
(*
base := 0. Default is <nil> anyway
*)
size := width := self basicSize.

Scan and bisect as long as their is a semispace available
[width] whileTrue:
[
| index |

index := (width // 2) + 1.

"If true then we scan the right semispace"
(sortBlock value: (self basicAt: base + index) value: tObject)
ifTrue:
[
Adjust the base to select the origin of right-hand semispace
base := base + index.

Converge the width cleverly (i.e., not "width :=
index"), taking advantage of integer division
truncation, to be the width of the right side removing
the value that was just tested.
width := width - index
]

"If false we scan the left semi-space"
ifFalse:
[
Converge the width to that of the left-hand semispace
removing the value that was just tested. We leave the
base as is since it is already at the start of the
left semispace.
width := index - 1
].
].

Return the location where the value would be inserted
^(base+1)
].

-- Dave S. [http://www.qks.st]

> Hi,
>
> while browsing the Containers sources, I discovered a performance leak.
> SortedCollection>>includes: does a _linear_ search on a *sorted*
collection.
> We all(?) know however, that this can be done in O(log(n)) time complexity
> using
> a binary search algorithm.
>
> I'm aware that in Smalltalk the simple algorithm is not applicable,
because
> the user can
> define her/his own SortBlock.
>
> But while preserving the sort property invariant, (i)(a1 <= a2 <= ...<=
an)
> respectively
> (ii)(an <= ... <= a2 <= a1), for ascending/descending sorted collections,
we
> can use this modified version,
> which determines with a simple test, whether the collection is in
> ascending/descending order.
>
> a1 <= an --> ascending, assuming sort property (i) holds (inductive
closure)
> an <= a1 --> descending, dto. for (ii)
>
> Furthermore, a collection is sortable, iff the elements class provides a
> "<=" relation.
>
> So, if our collection holds the sort property invariant *and* our
algorithm

> uses these operators
> (<,= or <=) only, we can use this algorithm. (q.e.d.)
>
> Of course the sort order property can be broken by a pathological user
> defined SortBlock,
> but then, we have no *SortedCollection* at all because it simply isn't
> sorted. Then and only then, the algorithm fails.
>
> I think, this is a sufficient formal proof for introducing the
> SequenceableCollection>>binarySearchFor: method and safely overwriting
> SortedCollection>>includes:
>
> ------------------ SNIP ------------------
>
> !SortedCollection methodsFor!
>
> includes: target
> ^self binarySearchFor: target
>
> The well known algorithm, modified for ascending/descending collections:
>
> ------------------ SNIP ------------------
>
> !SequenceableCollection methodsFor!
>
> binarySearchFor: aKey
> |l r m found|
> "wp: receiver has to be sorted"
>
> l := 1.
> r := self size.
>
> found := false.
>
> (self basicAt: l) <= (self basicAt: r) ifTrue: [ "ascending order"
>
> [found not and: [l <= r]] whileTrue:[
> m := (l + r) // 2.
> (found := ((self basicAt: m) = aKey)) ifFalse:[
> aKey < (self basicAt: m) ifTrue: [ r := m - 1]
> ifFalse: [ l := m + 1]
> ]
> ]
> ]
>
> ifFalse: [ "descending order"
>
> [found not and: [l <= r]] whileTrue:[
> m := (l + r) // 2.
> (found := ((self basicAt: m) = aKey)) ifFalse:[
> aKey < (self basicAt: m) ifTrue: [ l := m + 1]
> ifFalse: [ r := m - 1]
> ]
> ]
>
> ].
>
>
> ^found
>
>
> --------------- SNIP ----------------
>
> I believe the examples dont' have to be commented.
>
> oc := OrderedCollection new.
> 100000 timesRepeat:[oc add: (RandomString new: 10)].
> sc := oc asSortedCollection.
> dc := oc asSortedCollection sortBlock: [:sx :sy | sy <= sx].
>
>
> Time millisecondsToRun:[1000 timesRepeat:[oc includes: oc any]] 4108
> Time millisecondsToRun:[1000 timesRepeat:[sc includes: sc any]] 28
> Time millisecondsToRun:[1000 timesRepeat:[dc includes: dc any]] 28
>
> (N.B. Collection>>any picks a random element)
>
> -------------- SNIP --------------------
>
> After all you might think that this is not important, because Smalltalk

has

> much better methods for
> "Set representations" than sorted collections. These use hash algorithms,
> that are O(1), which is of
> course transcender. Well, that's right, but if/when we already *have* a
> SortedCollection, then
> we definitely *should* use a better search operation than linear.
>
>
>
> Bye
> Ingo
>
>
>
>

John Brant

Re: SortedCollection>>includes:

In reply to this post by Ingo Blank

"Ingo Blank" <[hidden email]> wrote in message
news:3b19ad22$0$139$[hidden email]...

> I understand, that our correspondence here comes from a too strict (or
> sloppy) understanding
> of the term "sorted" at my side. What you introduce with your fail-cases
are
> partial key
> ordering problems. Of course are they covered by the totaly free definable
> SortedCollection
> SortBlock. I should have defined "sorted" as "strictly sorted", where
> strictly means no partial
> keys. What the searchBlock compares, is what we are looking for. Not a
part
> of it.
> As a consequence we would have to introduce a SortedCollection subclass
> "StrictlySortedCollection".
> I don't know whether this is desirable.
>
> Of course are you right, that my algorithm fails for this kind of
problems.
> Undoubltly one needs to scan the partitioned collection in a linear
manner,
> to securely find an element in the most general case. I admit frankly,
that
> I didn't regognize
> the possible existence of partial keys.
>
> I believe that falling back a priori to this strategy is not optimal,
> because by far not
> all collections have this kind of ordering. What can we lose if trying a
> binSearch first ?
> IFF it fails, we can do a second run with that linear search strategy. If
we
> are unlucky,
> we lose exactly this O(log n) time, what is not much. O(n + log n) worst
> case, but
> O(log n) if we are lucky.

O(n + log n) is O(n).

> On the other hand, if using the modified linear method, which does a
> partioning via binSearch,
> a priori, we invest too much and get possibly too few for it. O(n) in
every
> case.

I don't understand why we would be investing too much here since it is also
bounded by O(n) and o(log n). Anyway, here's a version that finds the left
most position, and then does a linear scan from there as long as the
elements are sort block equal:

binaryIncludes: anObject
| includesEqual start end middle |
self size = 0 ifTrue: [^false].
includesEqual := sortBlock value: anObject value: anObject.
start := firstIndex.
end := lastIndex.
[end - start > 1] whileTrue:
[middle := (start + end) // 2.
(includesEqual
ifTrue: [sortBlock value: anObject value: (self basicAt:
middle)]
ifFalse: [(sortBlock value: (self basicAt: middle) value:
anObject) not])
ifTrue: [end := middle]
ifFalse: [start := middle]].
start ~~ end
ifTrue:
[anObject = (self basicAt: start) ifTrue: [^true] ifFalse:
[start := end]].

[anObject = (self basicAt: start) ifTrue: [^true].
start := start + 1.
start <= lastIndex and:
[includesEqual
ifTrue: [sortBlock value: (self basicAt: start) value:
anObject]
ifFalse: [(sortBlock value: anObject value: (self basicAt:
start)) not]]]
whileTrue.
^false

I tested this with a 10000 symbol collection, testing the includes for each
symbol:
32 seconds for normal Dolphin #includes:
600 ms for #binaryIncludes: when the symbols are sorted by #<=
12 seconds for #binaryIncludes: when the symbols are sorted by [:a :b |
a first <= b first]
102 seconds for #binaryIncludes: when the symbols aren't sorted (e.g.,
[:a :b | false])

BTW, Dolphin can't sort 10000 elements if you have a sortBlock of [:a :b |
true]. It has a stack overflow.

Anyway, the times depends on your collection type. If your sort block is
highly selective, then you should get good performance, otherwise it might
be a lot worse. Of course, using a Set is still a lot faster than the best
case #binaryIncludes:.

Now, here's my final test case that makes it hard to use #binaryIncludes:
for normal #includes:
'abc' asSortedCollection includes: 3
My draft of the ANSI standard says that includes: can only return true or
false, and cannot raise an error. If you try to evaluate the sort block, you
will get an error.

John Brant

Ingo Blank

Re: SortedCollection>>includes:

John,

> I don't understand why we would be investing too much here since it is
also
> bounded by O(n) and o(log n).

Yes, that was a misunderstanding at my side.

> Anyway, here's a version that finds the left
> most position, and then does a linear scan from there as long as the
> elements are sort block equal:

> binaryIncludes: anObject
> | includesEqual start end middle |
> self size = 0 ifTrue: [^false].
> includesEqual := sortBlock value: anObject value: anObject.
> start := firstIndex.
> end := lastIndex.
> [end - start > 1] whileTrue:
> [middle := (start + end) // 2.
> (includesEqual
> ifTrue: [sortBlock value: anObject value: (self basicAt:
> middle)]
> ifFalse: [(sortBlock value: (self basicAt: middle) value:
> anObject) not])
> ifTrue: [end := middle]
> ifFalse: [start := middle]].
> start ~~ end
> ifTrue:
> [anObject = (self basicAt: start) ifTrue: [^true] ifFalse:
> [start := end]].
>
> [anObject = (self basicAt: start) ifTrue: [^true].
> start := start + 1.
> start <= lastIndex and:
> [includesEqual
> ifTrue: [sortBlock value: (self basicAt: start) value:
> anObject]
> ifFalse: [(sortBlock value: anObject value: (self basicAt:
> start)) not]]]
> whileTrue.
> ^false
>
> I tested this with a 10000 symbol collection, testing the includes for

each
> symbol:
> 32 seconds for normal Dolphin #includes:
> 600 ms for #binaryIncludes: when the symbols are sorted by #<=
> 12 seconds for #binaryIncludes: when the symbols are sorted by [:a :b
|
> a first <= b first]
> 102 seconds for #binaryIncludes: when the symbols aren't sorted (e.g.,
> [:a :b | false])

Well, I believe that is a very good result. And of course *much* better than
the
OrderedCollection>>includes:. It has no trade off compared with the native
binarySearchFor: and covers the general case. We can't get more.

That was exactly my intention, but my first tries were obviously a little
bit too naive,
because I underestimated the power of the sortBlock...

> BTW, Dolphin can't sort 10000 elements if you have a sortBlock of [:a :b |
> true]. It has a stack overflow.
>
> Anyway, the times depends on your collection type. If your sort block is
> highly selective, then you should get good performance, otherwise it might
> be a lot worse. Of course, using a Set is still a lot faster than the best
> case #binaryIncludes:.

That's what I said. Smalltalk has much transcender mechanisms for efficient
Set representation than SortedCollections. But if/when we have generated
one, we
should use their properties.

> Now, here's my final test case that makes it hard to use #binaryIncludes:
> for normal #includes:
> 'abc' asSortedCollection includes: 3
> My draft of the ANSI standard says that includes: can only return true or
> false, and cannot raise an error. If you try to evaluate the sort block,
you
> will get an error.

As far as I understand this, the exceptions come from comparing
incompatible data types. (char with int in your example).
Could you wrap the compares in catch blocks, and return false if/when an
exception occurs ?

> John Brant

Bye and thanks
Ingo

David Simmons

Re: SortedCollection>>includes:

In reply to this post by David Simmons

"Paolo Bonzini" <[hidden email]> wrote in message
news:[hidden email]...

> > The QKS Smalltalk frameworks implemented <SortedCollection> methods for:
> >
> > #includes:
> > #indexOf:ifAbsent:
> >
> > "" Heart of the mechanism
> > #insertionIndexOf:
> >
> > All using binary search mechanics. Almost all the above methods are
> > implemented in terms of #insertionIndexOf:.
> >
> > Note, the algorithm for searching can (and should) be written in terms
> > of the <sortBlock> to avoid various issues you mention.
>
> That's plain. But the QKS Smalltalk implementation that you posted is
> wrong, because John's counterexamples will probably have it fail miserably
> (unless special code in #includes:, that you did not post, works around
> the bugs). In particular when I tried this simple test on GNU Smalltalk
> it failed
>
> | a sc |
> a := #('aa' 'ac' 'ab' 'bc' 'bb' 'ba' 'cc' 'ca' 'cb').
> sc := a asSortedCollection: [ :x :y | x first <= y first ].
> ^a allSatisfy: [ :each | sc includes: each ]

The above does work correctly when I test it on QKS Smalltalk. I'm not sure
what you refer to as the #insertionIndexOf: "bugs" that need to be worked
around by the "include:/indexOf:ifAbsent:" method(s). The "bugs" that need
to be worked around are those which would be in an incorrect sortBlock
algorithm like the one above which collates using only the first character
and thus effectively leaves the collection "as is".

I performed the test with both the stock version which first attempts a
binary search for lists greater than 42 entries in size. And then I hacked
the code to perform a binary scan for lists greater than 1 entry in size.

Both variants worked correctly (because if the binary search fails they
[must necessarily] default to a linear scan).

I.e., the binary search is an optimization. There is no guarantee that a
<SortedList> is actually sorted properly -- that depends entirely on the
"correctness" of the sortBlock's algorithm. Which is "not correct" in the
above example.

"" #collect: ...
a map: [:e| sc insertionIndexOf: e].

"" Returns the correct values:
{4,4,4,7,7,7,10,10,10}

Here's the other pieces of code you wanted:
------------------------------------------
Method behavior: SortedCollection
[
indexOf: object
ifAbsent: absentHandler

"Description: Return the index of an object equal to the given object or
evaluate
the absentHandler if the object is not found."

| size |

"First, determine if we have any entries. Then based on that value
decide if a binary search is more efficient than a linear scan.
The empirical crossover point is around 42 depending on the type
of data and the form of testing (ident/equal/equiv)."
((size := self basicSize) ifZero: [^absentHandler value]) > 42 ifTrue:
[|index|
"Now, find the closest index"
index := self insertionIndexOf: object.

"Depending on whether the sortBlock includes an equals test we
will get the index or the index-1. Furthermore, the
sortblock test may not actually compare the objects so this
is only an optimization; if it fails we still need to
perform a full linear scan."
((self basicAt: (index min: size)) = object) ifTrue: [^index].
((self basicAt: (index-1 max: 1)) = object) ifTrue: [^index-1].
].

"Perform a linear scan"
1 to: size do:
[:i |
((self basicAt: i) = object) ifTrue: [^i].
].

"If we failed after the thorough scan evaluate the absentHandler"
^absentHandler value
].

Method behavior: SortedCollection
[
includes: object

"Description: Return a boolean value indicating whether the receiver
includes and
object equal to the given element."

^(self indexOf: object ifAbsent: []) asBoolean
].

-- Dave S. [www.smallscript.net]

>
> I worked around the problem this morning with a little extra code (about
> 50 lines); I also had to add exception handling which I guess is what the
> send to thisContext does in your code. For performance, I split the
> binary search mechanisms in a special method that only uses sort blocks
> and is used when the element is intended to be in the collection (#add:
> and so forth), and another that does all the linear search crap.
>
> SortedCollection implementation is far from intuitive if you want to do it
> efficiently. As an additional example, consider that you have to provide
> equally good mechanisms for indexed accessing (usually you do a
> pre-sorting pass for this) and for incremental construction (where you
> usually use trees, either balanced or not)! In GNU Smalltalk I coalesce
> successive #add: operations, support a heap data structure if they are
> only using the SortedCollection as a priority queue, and have two sorting
> algorithms (heapsort and quicksort). This data structure gives O(n log n)
> behavior unless the client frequently interleaves adding and searching.
>
> Paolo
>
>
>
> --
> Posted from IDENT:[hidden email] [131.175.16.193]
> via Mailgate.ORG Server - http://www.Mailgate.ORG

David Simmons

Re: SortedCollection>>includes:

I should probably conjecture that one could potentially avoid the fallback
to the linear scan by pursuing some line of reasoning regarding a test of
the form (during the index scan):

(sortBlock value: (self at: a) value: (self at: b)
== sortBlock value: (self at: b) value: (self at: a)) ifTrue: [
"" bogus sort-block so do a linear scan
].

Which using a SmallScripts ability to use a more terse syntax
can also be written as:

if (sortBlock(self[a],self[b]) == sortBlock(self[b],self[a]))
[
].

-- Dave S. [www.smallscript.net]

P.S., arghh! I have to train myself to stop reflex typing "their" when I
meant to type "there"; or practice being less hasty in my post-proofreading.
It is irritating to write the post, send it, then read-it and immediately
spot such stupid typos.

"David Simmons" <[hidden email]> wrote in message
news:Ji9T6.65719$%[hidden email]...
> "Paolo Bonzini" <[hidden email]> wrote in message
> news:[hidden email]...
> > > The QKS Smalltalk frameworks implemented <SortedCollection> methods
for:

> > >
> > > #includes:
> > > #indexOf:ifAbsent:
> > >
> > > "" Heart of the mechanism
> > > #insertionIndexOf:
> > >
> > > All using binary search mechanics. Almost all the above methods are
> > > implemented in terms of #insertionIndexOf:.
> > >
> > > Note, the algorithm for searching can (and should) be written in terms
> > > of the <sortBlock> to avoid various issues you mention.
> >
> > That's plain. But the QKS Smalltalk implementation that you posted is
> > wrong, because John's counterexamples will probably have it fail

miserably

> > (unless special code in #includes:, that you did not post, works around
> > the bugs). In particular when I tried this simple test on GNU Smalltalk
> > it failed
> >
> > | a sc |
> > a := #('aa' 'ac' 'ab' 'bc' 'bb' 'ba' 'cc' 'ca' 'cb').
> > sc := a asSortedCollection: [ :x :y | x first <= y first ].
> > ^a allSatisfy: [ :each | sc includes: each ]
>
> The above does work correctly when I test it on QKS Smalltalk. I'm not

sure

> what you refer to as the #insertionIndexOf: "bugs" that need to be worked
> around by the "include:/indexOf:ifAbsent:" method(s). The "bugs" that need
> to be worked around are those which would be in an incorrect sortBlock
> algorithm like the one above which collates using only the first character
> and thus effectively leaves the collection "as is".
>
> I performed the test with both the stock version which first attempts a
> binary search for lists greater than 42 entries in size. And then I hacked
> the code to perform a binary scan for lists greater than 1 entry in size.
>
> Both variants worked correctly (because if the binary search fails they
> [must necessarily] default to a linear scan).
>
> I.e., the binary search is an optimization. There is no guarantee that a
> <SortedList> is actually sorted properly -- that depends entirely on the
> "correctness" of the sortBlock's algorithm. Which is "not correct" in the
> above example.
>
> "" #collect: ...
> a map: [:e| sc insertionIndexOf: e].
>
> "" Returns the correct values:
> {4,4,4,7,7,7,10,10,10}
>
> Here's the other pieces of code you wanted:
> ------------------------------------------
> Method behavior: SortedCollection
> [
> indexOf: object
> ifAbsent: absentHandler
>
> "Description: Return the index of an object equal to the given object

> evaluate
> the absentHandler if the object is not found."
>
> | size |
>
> "First, determine if we have any entries. Then based on that value
> decide if a binary search is more efficient than a linear scan.
> The empirical crossover point is around 42 depending on the type
> of data and the form of testing (ident/equal/equiv)."
> ((size := self basicSize) ifZero: [^absentHandler value]) > 42 ifTrue:
> [|index|
> "Now, find the closest index"
> index := self insertionIndexOf: object.
>
> "Depending on whether the sortBlock includes an equals test we
> will get the index or the index-1. Furthermore, the
> sortblock test may not actually compare the objects so this
> is only an optimization; if it fails we still need to
> perform a full linear scan."
> ((self basicAt: (index min: size)) = object) ifTrue: [^index].
> ((self basicAt: (index-1 max: 1)) = object) ifTrue: [^index-1].
> ].
>
> "Perform a linear scan"
> 1 to: size do:
> [:i |
> ((self basicAt: i) = object) ifTrue: [^i].
> ].
>
> "If we failed after the thorough scan evaluate the absentHandler"
> ^absentHandler value
> ].
>
> Method behavior: SortedCollection
> [
> includes: object
>
> "Description: Return a boolean value indicating whether the receiver
> includes and
> object equal to the given element."
>
> ^(self indexOf: object ifAbsent: []) asBoolean
> ].
>
> -- Dave S. [www.smallscript.net]
>
> >
> > I worked around the problem this morning with a little extra code (about
> > 50 lines); I also had to add exception handling which I guess is what

the
> > send to thisContext does in your code. For performance, I split the
> > binary search mechanisms in a special method that only uses sort blocks
> > and is used when the element is intended to be in the collection (#add:
> > and so forth), and another that does all the linear search crap.
> >
> > SortedCollection implementation is far from intuitive if you want to do
it
> > efficiently. As an additional example, consider that you have to
provide
> > equally good mechanisms for indexed accessing (usually you do a
> > pre-sorting pass for this) and for incremental construction (where you
> > usually use trees, either balanced or not)! In GNU Smalltalk I coalesce
> > successive #add: operations, support a heap data structure if they are
> > only using the SortedCollection as a priority queue, and have two
sorting
> > algorithms (heapsort and quicksort). This data structure gives O(n log
n)

> > behavior unless the client frequently interleaves adding and searching.
> >
> > Paolo
> >
> >
> >
> > --
> > Posted from IDENT:[hidden email] [131.175.16.193]
> > via Mailgate.ORG Server - http://www.Mailgate.ORG
>
>

John Brant

Re: SortedCollection>>includes:

In reply to this post by David Simmons

"David Simmons" <[hidden email]> wrote in message
news:Ji9T6.65719$%[hidden email]...
> "Paolo Bonzini" <[hidden email]> wrote in message
> news:[hidden email]...

> > That's plain. But the QKS Smalltalk implementation that you posted is
> > wrong, because John's counterexamples will probably have it fail
miserably

sure
> what you refer to as the #insertionIndexOf: "bugs" that need to be worked
> around by the "include:/indexOf:ifAbsent:" method(s). The "bugs" that need
> to be worked around are those which would be in an incorrect sortBlock
> algorithm like the one above which collates using only the first character
> and thus effectively leaves the collection "as is".

Sorting only on the first character is perfectly valid. Here's the
properties of a sort block from the ANSI standard draft (revision 1.9, page
219):
-----
1. Given the same 2 parameters, the sort block must answer the same result.
2. The sort block must obey transitivity. For example, if a is before b, and
b is before c, then a must be before c.
-----

Now, you may be complaining that the sort block is using #<= instead of #<.
The standard draft seems to imply sort blocks should not return true for
equal elements ("... answers true if the first parameter should be ordered
before the second parameter, and false otherwise."). Both VW and Dolphin use
a default sort block of #<=. However, they would sort collections with many
of the same elements faster if they used #<. For example, in VW if I sort a
collection of 10,000 1's using the default sort block takes ~7.5 seconds on
my machine, but if I use a #< sort block, it takes ~45 milliseconds.

John Brant

David Simmons

Re: SortedCollection>>includes:

"John Brant" <[hidden email]> wrote in message
news:QIAT6.24663$[hidden email]...

>
> "David Simmons" <[hidden email]> wrote in message
> news:Ji9T6.65719$%[hidden email]...
> > "Paolo Bonzini" <[hidden email]> wrote in message
> > news:[hidden email]...
>
> > > That's plain. But the QKS Smalltalk implementation that you posted is
> > > wrong, because John's counterexamples will probably have it fail
> miserably
> > > (unless special code in #includes:, that you did not post, works

around
> > > the bugs). In particular when I tried this simple test on GNU
Smalltalk

> > > it failed
> > >
> > > | a sc |
> > > a := #('aa' 'ac' 'ab' 'bc' 'bb' 'ba' 'cc' 'ca' 'cb').
> > > sc := a asSortedCollection: [ :x :y | x first <= y first ].
> > > ^a allSatisfy: [ :each | sc includes: each ]
> >
> > The above does work correctly when I test it on QKS Smalltalk. I'm not
> sure
> > what you refer to as the #insertionIndexOf: "bugs" that need to be

worked
> > around by the "include:/indexOf:ifAbsent:" method(s). The "bugs" that
need
> > to be worked around are those which would be in an incorrect sortBlock
> > algorithm like the one above which collates using only the first
character
> > and thus effectively leaves the collection "as is".
>
> Sorting only on the first character is perfectly valid. Here's the
> properties of a sort block from the ANSI standard draft (revision 1.9,
page
> 219):
> -----
> 1. Given the same 2 parameters, the sort block must answer the same
result.
> 2. The sort block must obey transitivity. For example, if a is before b,
and
> b is before c, then a must be before c.
> -----

Hi John,

My language was not particularly precise in the previous post. In other
words, the <sortBlock> was perfectly valid and there was no ANSI
non-conformance.

What I was citing as being "wrong" was the basic algorithm did not actually
provide a proper "binary-sort" with respect to object "=" comparison used in
#include:. As I was suggesting, and as I think Paolo may have implemented,
one can do additional tests using the sortBlock during a binary search to
ascertain that it is not consistent with the "=" test and therefore
fall-back to a linear scan.

I.e.,

'ab' != 'ac'

"" But, comparing via the sortBlock
sortBlock('ab','ac') => true

"" And the reversed form:
sortBlock('ac','ab') => true

"" Would suggest that the collation requirements
"" for a binary sort will not hold true.

P.S.,
sortBlock(x,y) is just a SmallScript variant for
sortBlock value: x value: y.

And #~= is aliased to #!=. (so they are identical messages)

>
> Now, you may be complaining that the sort block is using #<= instead of
#<.
> The standard draft seems to imply sort blocks should not return true for
> equal elements ("... answers true if the first parameter should be ordered
> before the second parameter, and false otherwise."). Both VW and Dolphin
use
> a default sort block of #<=. However, they would sort collections with
many
> of the same elements faster if they used #<. For example, in VW if I sort
a
> collection of 10,000 1's using the default sort block takes ~7.5 seconds
on
> my machine, but if I use a #< sort block, it takes ~45 milliseconds.

I must confess that I don't understand why #< is faster than #<=. Our
default sortBlock does happen to use #<, but the #<= vs #< methods perform
at essentially identical speeds. It has been too long since I thought this
through, so I don't remember why I chose #< as opposed to #<=.

Given that we use a bisection search algorithm for insertion points, the
choice of #< vs #<= only results in a left-semispace bias instead of a
right-semispace bias for your example. I haven't actually run the example to
compare so I might be missing something?

In the case of virtual-oops (SmallIntegers, Characters), the JIT inlines all
the comparison forms so they should perform equivalently.

In the general case, <IComparable> objects must (minimally) implement a
#compare: method which returns (-1,0,1). The <IComparable> interface
provides default implementations of #<, #>, etc in terms of the #compare:
method.

An extended compare interface form may support additional features for
equivalence comparison such as case-insensitive comparison, or
human-language/region-script sensitive comparison, etc.

--- Curious now...
Hmmm. wait a minute. Choosing the left-semispace consistently will tend to
cause insertion to occur at the start of the collection which is distinctly
slower. So, based on our current implementation, it should be much faster to
use a #<= comparison. Ok, I did some seat-of-the-pants benchmarks and the
difference is very significant.

Case SmallScript(STS) VisualWorks
---- ----------- -----------
< 150ms 18ms
<= 19ms 2567ms

The cases are reversed for Smallscript/QKS-Smalltalk vis-a-vis VisualWorks
5i3.

Upon further investigation it depends, as I speculated, on the way in which
the insertion is handled.

As I speculated the #<= case is causing insertions to take place at the end
of the collection. Whereas, the #< case is causing insertions to take place
at the start of the collection.

In STS it is much cheaper to add to the last indexed slot of the collection
because no slots are copied down. Hence we see a time of ~19ms for the test.
Whereas adding to the end takes some 150ms because of all the slot
copying-up operations.

But in VW it is apparently much cheaper to add to the front of the
collection (unfortunately the algorithm source is pretty
convoluted to follow through its various methods (and there are no comments)
so I can't readily see exactly why).

I gather that the algorithm is trying to manage slot-resize-growth at the
Smalltalk layer using #become: etc. [this feature is built into the AOS
Platform VM mutator for all objects; where the resize/growth policy is basic
property for all objects which is inherited from its class by default].

So in VW we see that the #< which causes insertions at the start of the
collection takes about ~18ms. Whereas when it goes through its convoluted
code case to add to the end it takes 2567ms (which is a really long time
compared to 150ms for SmallScript). I'm guessing that however it is copying
down/up the slots it is fairly slow (probably doing range-checks and slot
index calculations etc).

See #insert: "anObject" before: "spot". Where <spot> is "1".
----

-- Dave S. [www.smallscript.net]

>
>
> John Brant
>
>

Blair McGlashan

Re: SortedCollection>>includes:

In reply to this post by John Brant

John

You wrote in message
news:QIAT6.24663$[hidden email]...

> ...
> The standard draft seems to imply sort blocks should not return true for
> equal elements ("... answers true if the first parameter should be ordered
> before the second parameter, and false otherwise.").

I thought the standard explicitly stated somewhere that the default sort
block relationship should be #<, hence the #todo note in
SortedCollection>>value:value: (which implements the default sort "block" in
Dolphin). Ah yes, pages 227 and 228 in the specs for <SortedCollection
factory> #new and #new:, "A sort block is supplied wich guarantees thtat the
elements will be sorted in ascending order as specified by the #< message
for the elements".

>...Both VW and Dolphin use
> a default sort block of #<=. However, they would sort collections with
many
> of the same elements faster if they used #<. For example, in VW if I sort
a
> collection of 10,000 1's using the default sort block takes ~7.5 seconds
on
> my machine, but if I use a #< sort block, it takes ~45 milliseconds.

Too true, but it is for backwards compatibility having been defined like
that in Smalltalk-80. Normally one wouldn't worry about it that much, except
that as a result people have often implemented #<= on their classes so that
they work with the default sort block. Changing it unilaterally would break
a lot of people's stuff I think (though I wish we'd changed it in the early
days to avoid perpetuating it).

Regards

Blair

John Brant

Re: SortedCollection>>includes:

"Blair McGlashan" <[hidden email]> wrote in message
news:9fni48$52nmn$[hidden email]...
> You wrote in message
> news:QIAT6.24663$[hidden email]...
> >...Both VW and Dolphin use
> > a default sort block of #<=. However, they would sort collections with
> many
> > of the same elements faster if they used #<. For example, in VW if I
sort
> a
> > collection of 10,000 1's using the default sort block takes ~7.5 seconds
> on
> > my machine, but if I use a #< sort block, it takes ~45 milliseconds.
>
> Too true, but it is for backwards compatibility having been defined like
> that in Smalltalk-80. Normally one wouldn't worry about it that much,
except
> that as a result people have often implemented #<= on their classes so
that
> they work with the default sort block. Changing it unilaterally would
break
> a lot of people's stuff I think (though I wish we'd changed it in the
early
> days to avoid perpetuating it).

It isn't a hard bug to fix. I just looked at my Dolphin image, and there are
only 9 classes that define #<= but do not define #<. I think making people
fix their code would not be that hard. In fact, I bet running a script such
as:
| block |
block := [:cls |
((cls includesSelector: #<=) and: [(cls canUnderstand: #<) not])
ifTrue: [
| source |
source := cls sourceCodeAt: #<=.
cls compile: (source copyReplaceAll: '<=' with: '<')
categories: (cls compiledMethodAt: #<=) categories]].
Object withAllSubclasses do: block.
Object class withAllSubclasses do: block
would fix most classes.

John Brant

David Simmons

Re: SortedCollection>>includes:

In reply to this post by David Simmons

"Paolo Bonzini" <[hidden email]> wrote in message
news:[hidden email]...
> >Case SmallScript(STS) VisualWorks
> >---- ----------- -----------
> >< 150ms 18ms
> ><= 19ms 2567ms
>
> FWIW, GNU Smalltalk's times are about the same as SmallScript (171 ms with
> <, 21 ms with <=) on a Celeron/400 -- don't know what's your machine.

I'm a bit skeptical about those numbers. On the surface of it, the 21ms
number seems too fast without special casing support.

I just spent a couple hours with Blair McGlashan (Object-Arts' Dolphin)
comparing the implementations in each of the various Smalltalk dialects.
What we concluded is that the VisualAge approach (with deferred [heap]
sorting using a bubble-sort algorithm and binary searching) has the best
overall execution characteristics of the existing implementations.

But the important lessons were:

a) There is modest performance value to be gained in deferring the actual
sorting. But to really exploit this requires a careful strategy for
implementing and choosing a quicksort vs bubble-sort algorithm. If you're
not prepared to provide a careful strategy then using a bubble-sort
algorithm is distinctly more efficient (than what most current Smalltalk's
provide) for both fringe cases and random element cases.

b) The overall driver on performance is actually the cost of shuffling
elements around in the collection (not the comparison operations). As a
result using techniques like #basicNRCSlotAt: (from SmallScript), or using
primitives to shift slot elements up or down in a collection makes a
dramatic performance difference.

All in all, this is counter intuitive to what a C/C++ developer would
expect. The reason is that C/C++ doesn't perform dereferencing and range
checking within the code portions of a sort/insertion function that moves
objects/entries around. In Smalltalk the cost of moving those elements
around typically far outweighed the block-value operations.

To allow designs that compete with C/C++ requires either a (C/C++) primitive
or the using a JIT supported basicNRC... mechanism to allow elimination of
the range checking and dereferencing code. The availability of a basicNRC
mechanism allows writing all the code in Smalltalk. The use of deferred
sorts allows even more efficiency in selecting and implementing merge
algorithms.

In any case, the machine we did the tests on has a 1.2GHz AMD Duron running
Windows 2000 Service Pack 2 [with Netmeeting running on the machine].

We tested the most recently available versions of VisualWorks, VisualAge,
Dolphin, SmalltalkMT, Squeak, and SmallScript/SmalltalkAgents. (Blair, I
tested squeak after we quit NetMeeting).

-- SIDEBAR --
FYI for other testers, squeak really eats idle cycles and typically skews
benchmarks in other applications by 30%. So don't try benchmarks with squeak
running; the reverse is not true. I.e., having other Smalltalks running when
you benchmark squeak has nominal impact.
-- END SIDEBAR --

The JIT based Smalltalks were distinctly faster in their fast cases, the
Smalltalks with primitives for shuffling slot-elements were much faster than
those using equivalent "pure classic smalltalk" algorithms without such
facilities [whether they were jitted or not].

Assuming all things being equal (which is not really true), the GNU
Smalltalk numbers you gave for a 400MHz machine would roughly scale to be
7ms and 57ms on our test machine. I suspect that achieving those (7ms)
numbers requires special case code and C/C++ code (unless one has a JIT
supporting basicNRC...).

So my question to you is:

Do you have special case code in for the degenerate cases?

Do you have a JIT?

Are there any primitives involved in the collection element insertion or
sorting code?

OVERALL LESSON: Design of the algorithms is very important especially in
relation to the characteristics of a given Smalltalk implementation.

P.S., perhaps the most interesting side-thing about our efforts was that we
found wild differences in the browser facilities and the ease of use factors
among the Smalltalks. The same sequences in one system did not translate to
another and in many cases various natural/intuitive features were missing
between various Smalltalks.

I.e., browsing skills in one smalltalk IDE did not directly translate to
other smalltalk IDEs. We understood the features we wanted and how to
navigate, but actually finding [if they were even present] the right
elements (menus, keystrokes, tools) and figuring out how to use them
properly was not directly transferable from one system to another.

Cheers,

-- Dave S. [www.smallscript.net]

>
> Now I cannot really give any meaning to this, since the algorithms that
> GNU Smalltalk employ are quite sophisticated and based on heaps. What's
> more, that building the heap takes 50% of the time <=, and 95% with <.
> Now I must go and do some profiling!
>
> Paolo
>
>
> --
> Posted from IDENT:[hidden email] [131.175.16.193]
> via Mailgate.ORG Server - http://www.Mailgate.ORG

John Brant

Re: SortedCollection>>includes:

"David Simmons" <[hidden email]> wrote in message
news:30dU6.74274$%[hidden email]...
> "Paolo Bonzini" <[hidden email]> wrote in message
> news:[hidden email]...
> > >Case SmallScript(STS) VisualWorks
> > >---- ----------- -----------
> > >< 150ms 18ms
> > ><= 19ms 2567ms
> >
> > FWIW, GNU Smalltalk's times are about the same as SmallScript (171 ms
with

> > <, 21 ms with <=) on a Celeron/400 -- don't know what's your machine.
>
> I'm a bit skeptical about those numbers. On the surface of it, the 21ms
> number seems too fast without special casing support.
>
> I just spent a couple hours with Blair McGlashan (Object-Arts' Dolphin)
> comparing the implementations in each of the various Smalltalk dialects.
> What we concluded is that the VisualAge approach (with deferred [heap]
> sorting using a bubble-sort algorithm and binary searching) has the best
> overall execution characteristics of the existing implementations.

Actually, VA doesn't have any bubble-sort algorithm in it. It is a heap
sort. The *bubble* methods are maintaining the heap by moving objects
up/down in the heap. VA also uses a merge sort when performing large
#addAll:'s.

Since I've always heard that quicksort is faster than heapsort for almost
all real world problems, I decided to see why VA is faster. First I tried:
Time millisecondsToRun: [(10000 to: 1 by: -1) asSortedCollection]
Time millisecondsToRun: [(1 to: 10000) asSortedCollection]
Sure enough VA was faster (VA would either return 0 or 10 ms to run the
first one -- I don't think VA has a good millisecond clock; it seems that
most numbers returned are multiples of 10).

Now, since I knew that the people that told me about quicksort were smart
people, I thought that there must be something wrong with my tests. After
thinking about it for awhile, I realized that VA wasn't really sorting the
collection, it was just making a heap. So I modified my tests by sending
#first to the sorted collection. This causes the collection to be sorted if
it isn't already. After adding the #first message, here are the times (ms)
that I got:
Collection VA VW
10000-1 70, 70, 60 28, 28, 29
1-10000 100, 90, 100 27, 28, 33
random (1-10000) 80, 171, 150 46, 43, 52
The random test used the same random literal array of integers on both
sides.

This fits my overall impression of VA vs. VW: when I run simple benchmarks,
VA seems faster, but when I run real code, VW seems faster.

One interesting test that I did not perform is the addAll: method. Since VA
uses a merge sort and VW resorts the whole list, it would be interesting to
see the performance of each.

> b) The overall driver on performance is actually the cost of shuffling
> elements around in the collection (not the comparison operations). As a
> result using techniques like #basicNRCSlotAt: (from SmallScript), or using
> primitives to shift slot elements up or down in a collection makes a
> dramatic performance difference.

This is the problem with heapsort. Although it is O(n log n), it does too
much shuffling elements up/down the heap.

> P.S., perhaps the most interesting side-thing about our efforts was that
we
> found wild differences in the browser facilities and the ease of use
factors
> among the Smalltalks. The same sequences in one system did not translate
to
> another and in many cases various natural/intuitive features were missing
> between various Smalltalks.
>
> I.e., browsing skills in one smalltalk IDE did not directly translate to
> other smalltalk IDEs. We understood the features we wanted and how to
> navigate, but actually finding [if they were even present] the right
> elements (menus, keystrokes, tools) and figuring out how to use them
> properly was not directly transferable from one system to another.

That's why I made the Refactoring Browser be the same in VA as it was in VW.
I wanted things to work the same between the two. Of course people who know
VA don't like this, but the people with a VW background have no problems
with it...

John Brant

David Simmons

Re: SortedCollection>>includes:

"John Brant" <[hidden email]> wrote in message
news:UDfU6.32299$[hidden email]...

> "David Simmons" <[hidden email]> wrote in message
> news:30dU6.74274$%[hidden email]...
> > "Paolo Bonzini" <[hidden email]> wrote in message
> > news:[hidden email]...
> > > >Case SmallScript(STS) VisualWorks
> > > >---- ----------- -----------
> > > >< 150ms 18ms
> > > ><= 19ms 2567ms
> > >
> > > FWIW, GNU Smalltalk's times are about the same as SmallScript (171 ms
> with
> > > <, 21 ms with <=) on a Celeron/400 -- don't know what's your machine.
> >
> > I'm a bit skeptical about those numbers. On the surface of it, the 21ms
> > number seems too fast without special casing support.
> >
> > I just spent a couple hours with Blair McGlashan (Object-Arts' Dolphin)
> > comparing the implementations in each of the various Smalltalk dialects.
> > What we concluded is that the VisualAge approach (with deferred [heap]
> > sorting using a bubble-sort algorithm and binary searching) has the best
> > overall execution characteristics of the existing implementations.
>
> Actually, VA doesn't have any bubble-sort algorithm in it. It is a heap
> sort. The *bubble* methods are maintaining the heap by moving objects
> up/down in the heap. VA also uses a merge sort when performing large
> #addAll:'s.
>
> Since I've always heard that quicksort is faster than heapsort for almost
> all real world problems, I decided to see why VA is faster. First I tried:
> Time millisecondsToRun: [(10000 to: 1 by: -1) asSortedCollection]
> Time millisecondsToRun: [(1 to: 10000) asSortedCollection]
> Sure enough VA was faster (VA would either return 0 or 10 ms to run the
> first one -- I don't think VA has a good millisecond clock; it seems that
> most numbers returned are multiples of 10).

You've got to watch it. Those numbers you're seeing are *not* correct.

VA uses lazy sorting so the time returned is not correct until you access an
element (and thus force a sort). And, as you observed, the millisecond timer
values it returns are suspect since they are multiples of 10 which does not
correlate with making OS calls to the various time/timer services (or
directly using x86 cycle timers).

Try the following in both VA and VW:

"" Must access an element to force sort
|c| := SortedCollection sortBlock: [:a :b| a < b].
Time millisecondsToRun: [
10000 timesRepeat: [c add: 1]. c at: 1.
].

"" Must access an element to force sort
|c| := SortedCollection sortBlock: [:a :b| a <= b].
Time millisecondsToRun: [
10000 timesRepeat: [c add: 1]. c at: 1.
].

"" Must access an element to force sort
|c| := SortedCollection sortBlock: [:a :b| a < b].
Time millisecondsToRun: [
10000 timesRepeat: [c add: 1; at: 1]
].

"" Must access an element to force sort
|c| := SortedCollection sortBlock: [:a :b| a <= b].
Time millisecondsToRun: [
10000 timesRepeat: [c add: 1; at: 1]
].

====
| values |

values := [(10000 to: 1 by: -1).

"" Must access an element to force sort
Time millisecondsToRun: [values asSortedCollection at: 1].

"" Must access an element to force sort
|c| := SortedCollection sortBlock: [:a :b| a < b].
Time millisecondsToRun: [
c addAll: values; at: 1.
].

"" Must access an element to force sort
|c| := SortedCollection sortBlock: [:a :b| a <= b].
Time millisecondsToRun: [
c addAll: values; at: 1.
].

====
Try it again with <values> full of random numbers or random
strings, characters.

>
> Now, since I knew that the people that told me about quicksort were smart
> people, I thought that there must be something wrong with my tests. After
> thinking about it for awhile, I realized that VA wasn't really sorting the
> collection, it was just making a heap. So I modified my tests by sending
> #first to the sorted collection. This causes the collection to be sorted
if
> it isn't already. After adding the #first message, here are the times (ms)
> that I got:
> Collection VA VW
> 10000-1 70, 70, 60 28, 28, 29
> 1-10000 100, 90, 100 27, 28, 33
> random (1-10000) 80, 171, 150 46, 43, 52
> The random test used the same random literal array of integers on both
> sides.

You're not seeing the VW degenerate case. The case is very clear when you
compare a #< to a #<= sortBlock where all the elements in the array are 1.
But it also shows up quite clearly in a Random collection of integers when
comparing #< to #<=.

>
> This fits my overall impression of VA vs. VW: when I run simple
benchmarks,
> VA seems faster, but when I run real code, VW seems faster.

My general experience with micro-benchmarks (which tend to be impervious to
vendor's higher level algorithm issues), is that VW is about +50% faster
than VA for many/most things. Under those same micro-benchmarks, SmallScript
seems to typically be about the same speed or sometimes a little faster than
VW.

>
> One interesting test that I did not perform is the addAll: method. Since
VA
> uses a merge sort and VW resorts the whole list, it would be interesting
to
> see the performance of each.

We did that.

>
> > b) The overall driver on performance is actually the cost of shuffling
> > elements around in the collection (not the comparison operations). As a
> > result using techniques like #basicNRCSlotAt: (from SmallScript), or
using
> > primitives to shift slot elements up or down in a collection makes a
> > dramatic performance difference.
>
> This is the problem with heapsort. Although it is O(n log n), it does too
> much shuffling elements up/down the heap.

No. The heap/bubble sort algorithm in VA performed quite close to VW in
terms of the number of calls it made to the sortBlock. We benchmarked the
number of sortBlock valuations in all cases.

Actually, heap/bubble sorting in VA performed distinctly better than VW on
average. QuickSort does not perform well when its elements are largely
sorted, a good design would account for that by not-resorting the portion of
the collection which was already sorted [something that none of the
implementations does]. I.e., none of them maintain both a sorted subset, and
a set of "to be lazily added" elements.

It didn't matter whether we added elements one at a time via #add: or a
bunch of elements via #addAll:. We used <Random> sets of numbers and sets of
all 1's and all 'a' (strings).

More interesting is that squeak has the what looked to be the *same* code as
VW for SortedCollection, except that squeak has a primitive for copying
indexed slot elements. So where VW took some ~2800ms, Squeak would take
~350ms.

-- Dave S. [www.smallscript.net]

>
> > P.S., perhaps the most interesting side-thing about our efforts was that
> we
> > found wild differences in the browser facilities and the ease of use
> factors
> > among the Smalltalks. The same sequences in one system did not translate
> to
> > another and in many cases various natural/intuitive features were
missing
> > between various Smalltalks.
> >
> > I.e., browsing skills in one smalltalk IDE did not directly translate to
> > other smalltalk IDEs. We understood the features we wanted and how to
> > navigate, but actually finding [if they were even present] the right
> > elements (menus, keystrokes, tools) and figuring out how to use them
> > properly was not directly transferable from one system to another.
>
> That's why I made the Refactoring Browser be the same in VA as it was in
VW.
> I wanted things to work the same between the two. Of course people who
know
> VA don't like this, but the people with a VW background have no problems
> with it...
>
>
> John Brant
>
>

John Brant

Re: SortedCollection>>includes:

"David Simmons" <[hidden email]> wrote in message
news:lxgU6.75498$%[hidden email]...
> "John Brant" <[hidden email]> wrote in message
> > Now, since I knew that the people that told me about quicksort were
smart
> > people, I thought that there must be something wrong with my tests.
After
> > thinking about it for awhile, I realized that VA wasn't really sorting
the
> > collection, it was just making a heap. So I modified my tests by sending
> > #first to the sorted collection. This causes the collection to be sorted
> if
> > it isn't already. After adding the #first message, here are the times
(ms)

> > that I got:
> > Collection VA VW
> > 10000-1 70, 70, 60 28, 28, 29
> > 1-10000 100, 90, 100 27, 28, 33
> > random (1-10000) 80, 171, 150 46, 43, 52
> > The random test used the same random literal array of integers on both
> > sides.
>
> You're not seeing the VW degenerate case. The case is very clear when you
> compare a #< to a #<= sortBlock where all the elements in the array are 1.
> But it also shows up quite clearly in a Random collection of integers when
> comparing #< to #<=.

I decided to look at it a little more. The degenerate case for VW ends up
recursively sorting an empty partition and a partition that is one less than
the current partition size. If you add this line of code to VW before it
recursively sorts its components, you can make the degenerate case the
optimal case:

[k < j and: [sortBlock value: (self basicAt: k) value: dij]] whileTrue:
[k := k + 1].

This is bumping up the right partition to find an element that is not sort
block equal to the median element. This doesn't seem to have any noticable
effect on normal sorting of numbers. It might cause the sort block to be
evaluated a few more times than necessary, so if you have a slow sort block
and all elements are distinct, then the other way would be faster.

Another optimization you could make is to not recurse into the larger
partition and handling it with a while loop instead. This will effective
bound the number of recursive sort message to log n. Of course Smalltalk
implementations are highly optimized for message sends, so this isn't a big
win. However, the current recursive implementations can overflow the stack
on some Smalltalk implementations.

> > > b) The overall driver on performance is actually the cost of shuffling
> > > elements around in the collection (not the comparison operations). As
a
> > > result using techniques like #basicNRCSlotAt: (from SmallScript), or
> using
> > > primitives to shift slot elements up or down in a collection makes a
> > > dramatic performance difference.
> >
> > This is the problem with heapsort. Although it is O(n log n), it does
too
> > much shuffling elements up/down the heap.
>
> No. The heap/bubble sort algorithm in VA performed quite close to VW in
> terms of the number of calls it made to the sortBlock. We benchmarked the
> number of sortBlock valuations in all cases.

The number of evals of the sortBlock isn't necessarily related to the number
of shuffles. For example, the number of at:put:'s in VA's heap sort of (1
to: 10000) is 237263, but for VW's quicksort it is 0 (swap:with: isn't
called). Whereas the number of sortBlock evals for VA is 113631 and for VW
125438.

Also, bubble sort is completely different than heap sort. Bubble sort is a
O(n^2) algorithm. Here's the basic bubble sort:
| lastSwap temp |
lastSwap := lastIndex.
[lastSwap > firstIndex] whileTrue:
[temp := firstIndex.
firstIndex to: lastSwap - 1
do:
[:i |
(sortBlock value: (self basicAt: i + 1) value: (self basicAt: i))
ifTrue:
[self swap: i with: i + 1.
temp := i]].
lastSwap := temp]

I found this web page that has an animation of bubble, heap, and quick
sorts:
http://www.mcs.csuhayward.edu/~simon/handouts/sort/SortDemo.html

John Brant