sorting array-like collections

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

sorting array-like collections

Eliot Miranda-2
Hi All,

     sorted: for Array is OK, but for Array-like subclasses of ArrayedCollection it's suboptimal.  For example:


32-bit Mac OS X 2.2GHz Core i7 MBP

[| samples r samplesArray |
samples := Bitmap new: 256 * 1024.
r := Random new.
1 to: samples size do:
[:i|
samples at: i put: (r next * (1 << 32)) asInteger - 1].
samplesArray := samples asArray.
(1 to: 3) collect: [:i| {[samples sorted] timeToRun. [samplesArray sorted] timeToRun}] #(#(266 243) #(293 239) #(413 247))]

64-bit Mac OS X 2.2GHz Core i7 MBP
| samples r samplesArray |
samples := Bitmap new: 256 * 1024.
r := Random new.
1 to: samples size do:
[:i|
samples at: i put: (r next * (1 << 32)) asInteger - 1].
samplesArray := samples asArray.
(1 to: 3) collect: [:i| {[samples sorted] timeToRun. [samplesArray sorted] timeToRun}] #(#(143 134) #(300 133) #(141 137))

My point is about the difference in sorting speed for Array vs Bitmap, but before that note that 64-bit is faster to sort because there are fewer 32-bit to LargePositiveInteger conversions, but the difference between the Bitmap vs Array sort is larger on 64-bit because the 64-bit generation scavenger has a performance problem (which is why I'm looking at the above, fixing the VM profiler to work for 64-bit address spaces).


What's going on here is that Array's sorted: is

Array>>#sorted: aSortBlockOrNil
^self copy sort: aSortBlockOrNil

but for other subclasses of ArrayedCollection it is

Collection>>#sorted: aSortBlockOrNil
^self asArray sort: aSortBlockOrNil

so there is a slow copy of the Bitmap or ByteArray to an Array with equivalent elements, but a shallow copy (which is fast) in Array>>#sorted: to sort an Array.

When you look at the timings above you can see that when a GC occurs due to the slow copy it takes much longer to sort the Bitmap, and that it always takes longer to sort the Bitmap than the Array.

For many subclasses of ArrayedCollection, essentially everything other than the RunArrays and CompiledMethod, which are specialised, there is no need to convert to an Array, and a direct access to a shallow copy would work just as well.

So shouldn't we instead implement this as

ArrayedCollection>>#sorted: aSortBlockOrNil
^self copy sort: aSortBlockOrNil

and in subclasses that are exceptions override as e.g.
RunArray>>#sorted: aSortBlockOrNil
^self asArray sort: aSortBlockOrNil

SparseLargeArray>>#sorted: aSortBlockOrNil
    something much cleverer....


?
_,,,^..^,,,_
best, Eliot