Making a sorted collection to a SortedCollection

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Making a sorted collection to a SortedCollection

Gunnar Martinsen
The problem I have is maybe more about SmallTalk than Dolphin, but I
suspect it to touch into the more core parts of SmallTalk, and that
might work differently for other Smalltalk machines. Since I use
Dolphin myself, I choose to post it here.

About the problem, I happen to have an OrderedCollection (oc) which I
know is sorted according to some sort criteria. I want to make this
into a SortedCollection (sc) with the same objetcs, but without
wasting time to calculate the correct position of every object when
they are inserted, as I already know their position. If I just use

  sc := oc asSortedCollection: aBlock

then I will not avoid calculating the position of every element. I
want somthing like

  sc := SortedCollection new: oc size.
  sc sortBlock: aBlock.
  1 to: oc size do: [ :index |
    sc at: index withoutCalculatingTheExactPositionPut: (oc at: index)
].

Possible?

Gunner Martinsen


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Ian Bartholomew-17
Gunnar,

>                           I want to make this
> into a SortedCollection (sc) with the same objetcs, but without
> wasting time to calculate the correct position of every object when
> they are inserted, as I already know their position.

I had a quick look in the image and couldn't see anything obvious that might
work.  It's easy enough to do though (assuming you _really_ want to :-) )

Add a method to SortedCollection to allow direct access to the underlying
collection.

SortedCollection>>privateAt: anIndex put: anObject
    ^super at: anIndex put: anObject

You can then just do -

oc := ((1 to: 10) , (10 to: 1 by: -1)) asOrderedCollection.
sc := SortedCollection ofSize: oc size.
oc keysAndValuesDo: [:index :object | sc privateAt: index put: object].
sc sortBlock: SortedCollection.

It's a bit picky about the order of doing the above - you have to reset the
sortBlock (which automatically does a reSort) after putting the values into
the collection or else you get a walkback.

Regards
    Ian


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Ian Bartholomew-17
Forget that last bit - I've just noticed SortedCollection>>setSortBlock. You
can just do

oc := ((1 to: 10) , (10 to: 1 by: -1)) asOrderedCollection.
sc := SortedCollection ofSize: oc size.
sc setSortBlock: SortedCollection.
oc keysAndValuesDo: [:index :object | sc privateAt: index put: object].

and end up with an SC that preserves the original order with no #reSort at
all which, on rereading your original post, was what you wanted of course!

Ian


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Chris Uppal-3
Ian, Gunnar,

> oc keysAndValuesDo: [:index :object | sc privateAt: index put: object].

Small addition: you don't need to add an extra method, #privateAt:put:,
since #basicAt:put: is already defined and works fine.

Another point, in answer to the original question:

> Possible?

a simpler (but not necessarily better) answer would be "no".  There's no way
to do it without violating SortedCollection's encapsulation.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

David Royal-2
Gunnar

Out of interest why would you want to do this ? Isnt the whole point of
Sorted Collections that they sort their contents against some predetermined
criteria?
"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]...

> Ian, Gunnar,
>
> > oc keysAndValuesDo: [:index :object | sc privateAt: index put: object].
>
> Small addition: you don't need to add an extra method, #privateAt:put:,
> since #basicAt:put: is already defined and works fine.
>
> Another point, in answer to the original question:
>
> > Possible?
>
> a simpler (but not necessarily better) answer would be "no".  There's no
way
> to do it without violating SortedCollection's encapsulation.
>
>     -- chris
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Dave Harris-3
In reply to this post by Gunnar Martinsen
[hidden email] (Gunnar Martinsen) wrote (abridged):
> About the problem, I happen to have an OrderedCollection (oc) which I
> know is sorted according to some sort criteria. I want to make this
> into a SortedCollection (sc) with the same objetcs, but without
> wasting time to calculate the correct position of every object when
> they are inserted, as I already know their position.

In other words, an optimisation. Have you measured to verify that the
extra sorting time is significant?

I am pretty sure that the default code will first add all the elements in
non-sorted order, and then sort them all at once - it'll be more efficient
than adding/sorting them one at a time.

The time for the sorting part itself depends on the algorithm used, of
course. Some algorithms are unnaturally faster for already-sorted data. Eg
an insertion sort will have little more than a linear-scan to check for
out-of-order elements. In other words, O(N) rather than O(N log N), which
is the same complexity as adding N elements in the first place. Quick-sort
is slower, but if it has the median-of-three-pivot feature then sorted
data will be its best case, and anyway, good quick-sorts switch to
insertion-sort when the size of sub-sequences gets small enough.

I don't know off-hand what algorithm Dolphin uses, and anyway you wanted a
generic answer. If SortedCollection doesn't manage already sorted data
quickly, I'd consider replacing it's algorithm with one which does. This
can be done without breaking the abstraction which SortedCollection
represents.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      [hidden email]      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Gunnar Martinsen
In reply to this post by David Royal-2
"David Royal" <[hidden email]> wrote in message news:<YQt49.1738$[hidden email]>...
> Gunnar
>
> Out of interest why would you want to do this ? Isnt the whole point of
> Sorted Collections that they sort their contents against some predetermined
> criteria?

I am handed over some data externally which has to be an
OrderedCollection, but happens to be sorted. When I receive it, I want
to treat it as a SortedCollection as I might add, remove or modify
objects, and regularly need to present the collection in sorted order.
Time is all the way a crucial matter, the collection might be big.

Therefore, I will use the sort criteria of my SortedCollection all the
time, except when it's originally created from the previous
collection.

Gunnar


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Gunnar Martinsen
In reply to this post by Dave Harris-3
[hidden email] (Dave Harris) wrote in message news:<[hidden email]>...
> [hidden email] (Gunnar Martinsen) wrote (abridged):
> > About the problem, I happen to have an OrderedCollection (oc) which I
> > know is sorted according to some sort criteria. I want to make this
> > into a SortedCollection (sc) with the same objetcs, but without
> > wasting time to calculate the correct position of every object when
> > they are inserted, as I already know their position.
>
> In other words, an optimisation. Have you measured to verify that the
> extra sorting time is significant?

No, I have just noticed it takes time, much more than expected. I
should measure it anyway, I agree.

> I am pretty sure that the default code will first add all the elements in
> non-sorted order, and then sort them all at once - it'll be more efficient
> than adding/sorting them one at a time.

Your most probably right. I forgot that SortedCollection probably
reimplements #addAll: which is used in
Collection>>asSortedCollection:.

Thanks

Gunnar


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Chris Uppal-3
Gunnar Martinsen wrote:

> > I am pretty sure that the default code will first add all the elements
in
> > non-sorted order, and then sort them all at once - it'll be more
efficient
> > than adding/sorting them one at a time.
>
> Your most probably right. I forgot that SortedCollection probably
> reimplements #addAll: which is used in
> Collection>>asSortedCollection:.

Unfortunately the Dolphin implementation uses its standard quicksort
algorithm to implement #addAll (at least in the circumstances where the list
to be added is >1/3 the size of the existing sorted list), and quicksort
performs O(n^2) for sorted lists.  So in your case you are seeing
SortedCollection performing at its worst.

BTW, I've occasionally speculated that SortedCollection>>addFirst: and
#addLast: should not be #doNotImplement, but should treat the given position
as a *hint* about where the new element is likely to fit (so that it could,
e.g, start a binary chop, or even just a linear search, at that point).  In
your case you could add all the elements to a new SortedCollection with an
#addLast: loop.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Chris Uppal-3
I wrote:

> quicksort
> performs O(n^2) for sorted lists.  So in your case you are seeing
> SortedCollection performing at its worst.

I have to correct myself.

As Dave Harris mentioned, a median-of-three quicksort will deal with
pre-sorted lists without going O(n^2), and Dolphin's quicksort does use
median-of-three and so should be approximately linear in the number of
comparisons (I think it's still O(n^2) in the number of recursive calls to
quicksort, so can't be *really* linear).

Constant factors (or the residual non-linearity) are relatively ugly
though -- #quicksortFrom:to: is about 7 times slower than
#insertsortFrom:to: for a previously sorted list of 10,000 integers, and
about 10 times slower for a sorted list of 1,000,000 Integers.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

John Brant
In reply to this post by Chris Uppal-3
"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]...
>
> Unfortunately the Dolphin implementation uses its standard quicksort
> algorithm to implement #addAll (at least in the circumstances where the
list
> to be added is >1/3 the size of the existing sorted list), and quicksort
> performs O(n^2) for sorted lists.  So in your case you are seeing
> SortedCollection performing at its worst.

This is only true if you use the standard quicksort algorithm. Dolphin uses
the median of three version, so if you have a sorted list, it will take O(n
lg(n)). For example, compare the running times of :

1)
    (1 to: 10000) asArray asSortedCollection

2)
    | coll |
    coll := SortedCollection new.
    (1 to: 10000) asArray do: [:i | coll add: i]

3)
    | coll |
    coll := Array new: 10000 withAll: 1.
    coll asSortedCollection

On my machine, the first one runs in ~40 ms and the second one takes ~60 ms,
but the third one takes ~5 seconds. The third one is n^2 -- the other two
aren't. With a change to the quicksortFrom:to: method, the last case can
become the fastest case and run in linear time.

While case 3 takes n^2 time, it isn't the slowest version for Dolphin.
Inserting a single element at the begining of the list is the slowest (still
O(n^2) time though):

4)
    | coll |
    coll := SortedCollection new.
    (10000 to: 1 by: -1) asArray do: [:i | coll add: i]

When you insert an element, it much find its position (lg(n)) and then move
all of the elements one position to the right to insert the element at the
front. On my machine it takes over 7 seconds to run.

In VisualWorks, case 1 & 4 both run in ~10 ms (or O(n lg(n))). VW makes the
inserting a single element at the front fast. However, if you insert an
element at the end (case 2), then it takes ~1.5 seconds. Dolphin makes room
at the end for inserting new elements and VW makes the room at the
beginning. VisualWorks also has the same problem with case 3 and runs that
case in ~1.8 seconds.


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

John Brant
In reply to this post by Chris Uppal-3
"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]...
>
> As Dave Harris mentioned, a median-of-three quicksort will deal with
> pre-sorted lists without going O(n^2), and Dolphin's quicksort does use
> median-of-three and so should be approximately linear in the number of
> comparisons (I think it's still O(n^2) in the number of recursive calls to
> quicksort, so can't be *really* linear).

The number of recursive calls will always be O(n) -- even in the O(n^2)
case. Essentially, it is creating an n element tree, and the edges are the
recursive calls. A tree of size n will have n-1 edges.

In the already sorted case using the median of three quicksort, we get a
balanced tree so each recursive call is dividing up the values into two
roughly equal groups. In the worst quicksort case, we are dividing the
values into two groups of very different sizes (one side we get 0 elements
and the other side we get n-1 elements). So, in the good case we get a tree
of height lg(n) and in the worst case we get a tree of height n. No matter
whether we have the good case or the bad case, each level of tree takes O(n)
time to perform the partitioning, so we get O(n lg(n)) time in the good case
and O(n^2) time in the worst.

> Constant factors (or the residual non-linearity) are relatively ugly
> though -- #quicksortFrom:to: is about 7 times slower than
> #insertsortFrom:to: for a previously sorted list of 10,000 integers, and
> about 10 times slower for a sorted list of 1,000,000 Integers.

Insertion sort of an already sorted collection is linear, but quicksort on
an already sorted collection is O(n lg(n)).


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Dave Harris-3
In reply to this post by Chris Uppal-3
[hidden email] (Chris Uppal) wrote (abridged):
> BTW, I've occasionally speculated that SortedCollection>>addFirst: and
> #addLast: should not be #doNotImplement, but should treat the given
> position as a *hint* about where the new element is likely to fit
> (so that it could, e.g, start a binary chop, or even just a linear
> search, at that point).

I would expect #addLast: to have a post-condition that the new last
element is the item just added. Your suggestion breaks that.

However, I think it would be reasonable for #addLast: to verify that the
item belonged at the end and insert it there if it does, only raising an
error if it doesn't. This could actually be quicker because we'd never
bother doing the binary chop. And #addAllLast: could do the same - which
is even better, because it gains economies of scale.

    addAllLast: aCollection
        (aCollection isSorted: [sortBlock])
            ifFalse: [^self shouldNotImplement].
        aCollection isEmpty
            ifTrue: [^aCollection].
        (self isEmpty or: [sortBlock
                  value: self last
                  value: aCollection first])
            ifFalse: [^self shouldNotImplement].
        ^super addAllLast: aCollection

Here #isSorted: would be defined by SequenceableCollection. It takes O(N),
although for a SortedCollection it might be worth just checking that the
sortBlocks are the same. (I don't know how to do that in general.) It
might be worth having a special subclass of OrderedCollection which caches
the result.

I am also using SequenceGrowable>>addAllLast:, whose implementation is
ridiculously slow here. It would need to be overridden in
OrderedCollection along the same lines as #addAllFirst:. In other words,
to do the bounds check and resize just once, instead of for each element.

Given that, you'd get #addAllLast: to be O(N) for collections which are
already sorted.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      [hidden email]      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Chris Uppal-3
Dave Harris wrote:

> I would expect #addLast: to have a post-condition that the new last
> element is the item just added. Your suggestion breaks that.

That's a valid POV.  Whether my suggestion is at fault for breaking the
contract depends on whether you see #shouldNotImplement as breaking the
contract too, and also what you see the contract as being.  I was thinking
of it as being 'add the element at the end, and then let it perculate to
where it belongs'.  Using a different selector would avoid the ambiguity,
but at the cost of adding a new selector which was not understood by the
other Collections.  I think that this sort of selector "punning" (same name,
not quite the same meaning) is quite common in the Collection heirarchy --
consider #do: on LookupTable and Set for instance.

> However, I think it would be reasonable for #addLast: to verify that the
> item belonged at the end and insert it there if it does, only raising an
> error if it doesn't.

A reasonable suggestion (but notice that it too is "renogotiating" the
contract).  I still think there's good reason for having a "hinted" version
(whatever it's called) as well.

> This could actually be quicker because we'd never
> bother doing the binary chop.

Thinking about it, I don't think the binary chop would be a good idea -- the
natural interpretation of the "hint" would be 'a linear search starting at
the {end/beginning} is the fastest way to find the correct slot'.

> Here #isSorted: would be defined by SequenceableCollection. It takes O(N),
> although for a SortedCollection it might be worth just checking that the
> sortBlocks are the same. (I don't know how to do that in general.)

I think an identity check would suffice, and be as general as is feasible.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Chris Uppal-3
In reply to this post by John Brant
John Brant wrote

> The number of recursive calls will always be O(n) -- even in the O(n^2)
> case.  [etc]

Yup.  I shouldn't try to work stuff out in my head when I've not had any
sleep.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Dave Harris-3
In reply to this post by Chris Uppal-3
[hidden email] (Chris Uppal) wrote (abridged):
> > I would expect #addLast: to have a post-condition that the new last
> > element is the item just added. Your suggestion breaks that.
>
> That's a valid POV.  Whether my suggestion is at fault for breaking the
> contract depends on whether you see #shouldNotImplement as breaking the
> contract too, and also what you see the contract as being.

I see #shouldNotImplement as the Smalltalk way of deleting the method.
It's roughly akin to making SortedCollection inherit from Object,
reimplementing all its inherited methods, but not implementing #addLast:.
It's not so much breaking the contract as refusing to make the contract in
the first place.

It means that SortedCollection has "broken the contract" of
OrderedCollection, but that's another matter.


> > However, I think it would be reasonable for #addLast: to verify that
> > the item belonged at the end and insert it there if it does, only
> > raising an error if it doesn't.
>
> A reasonable suggestion (but notice that it too is "renogotiating" the
> contract).

Yes, we can see it as strengthening the pre-condition.

So we're both changing the contract. The difference is that you are doing
so silently. In my version, if someone doesn't know about the changed
contract they will get a highly visible error message. In your version,
the code will appear to work, but will have done something different.

I suppose it is a matter of temperament which we prefer.

  Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
      [hidden email]      |   And close your eyes with holy dread,
                              |  For he on honey dew hath fed
 http://www.bhresearch.co.uk/ |   And drunk the milk of Paradise."


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Blair McGlashan
In reply to this post by John Brant
"John Brant" <[hidden email]> wrote in message
news:VQP49.63224$D36.65170@rwcrnsc53...
> ....
> 3)
>     | coll |
>     coll := Array new: 10000 withAll: 1.
>     coll asSortedCollection
>
> On my machine, the first one runs in ~40 ms and the second one takes ~60
ms,
> but the third one takes ~5 seconds. The third one is n^2 -- the other two
> aren't. With a change to the quicksortFrom:to: method, the last case can
> become the fastest case and run in linear time.
> ...

John, can you remind me of the change, and I will apply it for the
forthcoming release. Having a collection to sort with a large number of
equal elements is quite a common case, particularly (it seems) in UI lists.

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

John Brant
"Blair McGlashan" <[hidden email]> wrote in message
news:aj7p5n$18qpge$[hidden email]...

> "John Brant" <[hidden email]> wrote in message
> news:VQP49.63224$D36.65170@rwcrnsc53...
> > ....
> > 3)
> >     | coll |
> >     coll := Array new: 10000 withAll: 1.
> >     coll asSortedCollection
> >
> > On my machine, the first one runs in ~40 ms and the second one takes ~60
> ms,
> > but the third one takes ~5 seconds. The third one is n^2 -- the other
two
> > aren't. With a change to the quicksortFrom:to: method, the last case can
> > become the fastest case and run in linear time.
> > ...
>
> John, can you remind me of the change, and I will apply it for the
> forthcoming release. Having a collection to sort with a large number of
> equal elements is quite a common case, particularly (it seems) in UI
lists.

The basic idea is to skip over the elements that are next to the partition
element and are equal to the partition element. When you sort equal
elements, you end up having to recursively call sort with a parition of (lo
to: j - 1) and (i to: up) where j = up - 1, i = up, and the partition
element is at j. Now, if we find a j' where all elements from (j'+1 to: j)
are equal, then we just need to call the sort on (lo to: j') and (i to: up).
For the equal case j' = lo, so we are just calling sort on (lo to: lo) and
(up to: up).

Anyway, here's my changes:

*) Right after putting the parition element "a" at position j, add the
following line:
  [(j := j - 1) > lo and: [sortBlock value: a value: (self basicAt: j)]]
whileTrue.
We already know that everything from (lo to: j - 1) is <= the partition
element, so if the sort block says that the partition element is <= some
element in (lo to: j-1), then they must be equal (i.e., (sortBlock value: a
value: b) = (sortBlock value: b value: a) ==> a is sort equal to b).

*) Change the recursive sort condition to be:
  (up - i) < (j - lo)
before it was using (up - j) which seemed wrong anyway.

*) Finally, instead of "j - 1" in the ifTrue: and ifFalse: branches of the
recursive sort, use "j".


This will require extra calls to the sortBlock when the elements are
different so it may slow down that case somewhat, but for the all equal
case, it changes the running time from ~5 seconds to ~9 ms. If we change the
asSortedCollection to be asOrderedCollection, it runs in ~7 ms, so the
asSortedCollection is only 2 ms slower for 10000 elements.


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

Blair McGlashan
"John Brant" <[hidden email]> wrote in message
news:RpP59.70781$UU1.13216@sccrnsc03...
> ...[snip]
> The basic idea is to skip over the elements that are next to the partition
> element and are equal to the partition element. When you sort equal
> elements, you end up having to recursively call sort with a parition of
(lo
> to: j - 1) and (i to: up) where j = up - 1, i = up, and the partition
> element is at j. Now, if we find a j' where all elements from (j'+1 to: j)
> are equal, then we just need to call the sort on (lo to: j') and (i to:
up).
> For the equal case j' = lo, so we are just calling sort on (lo to: lo) and
> (up to: up).
>
> Anyway, here's my changes:
> ...[snip]...

Thanks John. It seems to add about 1 to 1.5% to the number of invocations of
the comparison block on large random sorts. The time overhead was of a
similar magnitude, but this would depend on the sort block. However it
reduces (in my tests) a former worst case of sorting 2500 integers with the
extreme sort block [:a :b | true], from over 600mS to just 3mS.

Outside our SUnits another example is the time to sort by the "Writeable"
column in the Source Browser when viewing all source objects (about 2000 in
my current image). This is column displays a boolean value with the vast
majority of items having the same value, the sort block is also expensive
since testing the writeability of the file involves an OS file system API
call. Without the change, I got the following results:

Initial Sort (click column)                8999mS
Invert Sort (click column again)        398132mS    (ouch, I had to go and
make the tea while this ran!)
Invert Sort (click column again)    7510mS

As you can see the sort into "descending" order took an absurdly long time.
With the change I got the following results:

Initial Sort    9075mS
Invert Sort    2451mS
Invert Sort    6481mS

So this confirms that the change very slightly slows down the random case,
but can massively increase the speed of some former worst cases.

Given that this is a general purpose sorting algorithm, I think that paying
a very small increase in times for sorting unsorted data is well worth
paying in order to speed up the worst cases by (at least) a couple of orders
of magnitude.

> ...
> *) Change the recursive sort condition to be:
>   (up - i) < (j - lo)
> before it was using (up - j) which seemed wrong anyway.

Hmmm, looking at the source of the algorithm I would say that was a
transcription error.

Attached is the revised sort implementation if anyone wants to try it.

Regards

Blair

-------------------------------
!SortedCollection methodsFor!

quicksortFrom: start to: stop
 "Private - Sort elements start through stop of self to be non-descending
according to sortBlock.
 Note that this is not a stable sort, so any current ordering will be lost."

 "Implementation Note: This is a part iterative, part recursive, Quicksort
implementation
 based on that from 'Numerical Recipes in C', Press, Teukolsky, et al.. It
is marginally faster
 (about 5-15%) on average than the traditional Smalltalk-80 sort, which also
seems to be
 some kind of modified quicksort, and about twice as fast in some cases. In
particular it
 exhibits better performance when used in conjunction with a #<= comparison,
e.g. the
 default sort block, especially when the collection is already sorted."

 | up lo |
 up := stop.
 lo := start.
 [up - lo > 5] whileTrue:
   ["Choose median and arrange so [lo+1] <= [lo] <= [up]"

   | i j k m temp a |
   k := lo + up bitShift: -1.
   m := lo + 1.
   temp := self basicAt: k.
   self basicAt: k put: (self basicAt: m).
   self basicAt: m put: temp.
   (sortBlock value: (self basicAt: up) value: temp)
    ifTrue:
     [self basicAt: m put: (self basicAt: up).
     self basicAt: up put: temp].
   (sortBlock value: (self basicAt: up) value: (self basicAt: lo))
    ifTrue:
     [temp := self basicAt: up.
     self basicAt: up put: (self basicAt: lo).
     self basicAt: lo put: temp].
   (sortBlock value: (self basicAt: lo) value: (self basicAt: m))
    ifTrue:
     [temp := self basicAt: lo.
     self basicAt: lo put: (self basicAt: m).
     self basicAt: m put: temp].

   "Partition...(note we must test that i and j remain in bounds because the
sort block may use <= or >=."
   i := m. "i.e. start from lo+2"
   j := up. "i.e. start from up-1"
   a := self basicAt: lo.

   [[i < j and: [sortBlock value: (self basicAt: (i := i + 1)) value: a]]
whileTrue.
   [j >= i and: [sortBlock value: a value: (self basicAt: (j := j - 1))]]
whileTrue.
   j < i]
     whileFalse:
      [temp := self basicAt: i.
      self basicAt: i put: (self basicAt: j).
      self basicAt: j put: temp].

   "Insert partitioning element"
   self basicAt: lo put: (self basicAt: j).
   self basicAt: j put: a.

   "Skip sort-equal elements to speed up worst cases - suggested by John
Brant"
   [(j := j - 1) > lo and: [sortBlock value: a value: (self basicAt: j)]]
whileTrue.

   "Recursively sort smaller sub-interval and process larger remainder on
the next loop iteration"
   up - i < (j - lo)
    ifTrue:
     [self quicksortFrom: i to: up.
     up := j]
    ifFalse:
     [self quicksortFrom: lo to: j.
     lo := i]].

 "When interval size drops below threshold perform an insertion sort
(quicker for small numbers of elements)"
 ^self insertsortFrom: lo to: up! !
!SortedCollection categoriesFor: #quicksortFrom:to:!algorithms!private! !


Reply | Threaded
Open this post in threaded view
|

Re: Making a sorted collection to a SortedCollection

John Brant
"Blair McGlashan" <[hidden email]> wrote in message
news:ajar03$19d6k1$[hidden email]...
> "John Brant" <[hidden email]> wrote in message
> news:RpP59.70781$UU1.13216@sccrnsc03...
> > ...[snip]
> > The basic idea is to skip over the elements that are next to the
partition
> > element and are equal to the partition element. When you sort equal
> > elements, you end up having to recursively call sort with a parition of
> (lo
> > to: j - 1) and (i to: up) where j = up - 1, i = up, and the partition
> > element is at j. Now, if we find a j' where all elements from (j'+1 to:
j)
> > are equal, then we just need to call the sort on (lo to: j') and (i to:
> up).
> > For the equal case j' = lo, so we are just calling sort on (lo to: lo)
and
> > (up to: up).
> >
> > Anyway, here's my changes:
> > ...[snip]...
>
> Thanks John. It seems to add about 1 to 1.5% to the number of invocations
of
> the comparison block on large random sorts. The time overhead was of a
> similar magnitude, but this would depend on the sort block. However it
> reduces (in my tests) a former worst case of sorting 2500 integers with
the
> extreme sort block [:a :b | true], from over 600mS to just 3mS.

You might increase the cut over point for the insertion sort. Currently, it
is set at 5. From the tests that I ran on the original sort algorithm, I
couldn't really notice any time difference between 5-13. The larger the cut
over point, the fewer times you need to execute the new while loop. For
example, with the cut over point at 5, sorting (1 to: 10000) will require
executing the new while loop 2047 times. But, if you increase the cut over
point to 9, you will cut the executions down to 1023. If you increase the
cut over to be 10-12, you may be able to cut the overhead to 0.5-0.75%.


John Brant


12