Smalltalk › Squeak › Squeak - Dev

Implementing lazy loading for expandable collections (part 2)

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

Max Leske

Implementing lazy loading for expandable collections (part 2)

Hi Alex

I thought the following little experiment might interest you.

I was trying to find out (as per your suggestion) how big the difference would be between:
- setting the array in OrderedCollection to nil and initializing it to size 10 on first access
- initializing the array in OrderedCollection to size 0.

Expectation: #grow will take time, ergo, using an empty array should be slower than using
a preallocated array.

Experiment (Pharo 1.3 on SqueakVM):

[ 10000 timesRepeat: [ | c | c := OrderedCollection new: 100000.
1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 17566

[ 10000 timesRepeat: [ | c | c := OrderedCollection new: 10000.
1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 19717

[ 10000 timesRepeat: [ | c | c := OrderedCollection new: 0.
1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 17598

[ 10000 timesRepeat: [ | c | c := OrderedCollection new: 1000.
1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 18084

Result: growing a collection is always faster (or at least nearly equally fast)!

Explanation: What makes preinitialized collections so slow is the fact the the firstIndex variable will be set to
the first third of the array (3333 in this case). When the last position in the array is reached, all elements
are copied to the beginning of the array (2n operations). This copy operation is really slow (which can be seen
from the first example with an oversized collection that does not require copying because the last position
of the array is never reached).

Experiment (Pharo 4 on PharoVM):
(I’m using bigger numbers here to compensate for the faster VM :) )

[ 10000 timesRepeat: [ | c | c := OrderedCollection new: 1000000.
1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:01:09.225"

[ 10000 timesRepeat: [ | c | c := OrderedCollection new: 100000.
1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:37.653"

[ 10000 timesRepeat: [ | c | c := OrderedCollection new: 0.
1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:40.152"

[ 10000 timesRepeat: [ | c | c := OrderedCollection new: 10000.
1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:42.121”

Result: growing a collection is easily fast enough, although minute performance gains are possible with preinitialized collections.

Explanation: #makeRoomAtLast has been improved. Also: firstIndex is now set to the beginning of the array, preventing the copy
operation that made the former measurements so slow.
The run with the oversized collection is only so slow because it apparently takes a lot of time to assign the large array (?).

Consequence of my little experiment: My initial implementation of the idea in you paper set the array in OrderedCollection to nil
under the assumption that grow operations are very costly. That choice leads to many changes in methods of OrderedCollection
where it is necessary to check if the array has been initialized yet. This introduces overhead and makes the code more error prone.
I see now that it makes much more sense to use an empty array, the performance lost by growing ist neglectable.

It is also interesting to note that there almost never seems to be a good reason to initialize an OrderedCollection with a large array.

Cheers,
Max