Allowing Larger Collections by Reducing Growth Size

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Allowing Larger Collections by Reducing Growth Size

Runar Jordahl
For some scenarios, Collection>>growSize is too aggressive when
increasing the collection’s size: It doubles the size for each growth.
This has the consequence that collections with an element size >= 2^27
(128 MB) cannot grow.

Below is an example of code that fails when using a 32 bit image:

| size |
size := 2 raisedTo: 27.
(String new: size withAll: $a) growToAtLeast: size + 1

The string is of size 128 MB. The maximum object size is 256 MB, but
the 128 MB string cannot even grow to size 128 MB + 1.

I think that SequenceableCollection>>growToAtLeast: or
Collection>>growSize should be changed to allow growing objects beyond
128 MB. One sinple solution would be to change Collection>>growSize
like this:
                ^(self capacity min: 1048576) max: 2

This solves the problem. It limits with growth size to 1 MB. But the
consequence is that the underlying collection will perform the
(expensive) growth operation in SequenceableCollection>>changeSizeTo:
more often.

An alternative solution is to change
SequenceableCollection>>growToAtLeast: to the following:

          self growToAtLeast: anInteger growSize: self growSize

In addition, a new method
SequenceableCollection>>growToAtLeast:growSize: must be added:

                "Attempt to grow to size of anInteger + growSize. If
resulting collection is too large for implementation limits, reduce
growSize and retry."


                | maximumSize attemptedNewSize |


                anInteger <= self size ifTrue: [^self].


                maximumSize := ObjectMemory is64Bit
                               ifTrue: [72057594037927936
                         "2^56"]
                               ifFalse: [268435456
                   "2^28"].
                attemptedNewSize := anInteger + growSize.
                (attemptedNewSize > maximumSize and: [growSize > 0])
                               ifTrue: [self growToAtLeast: anInteger
growSize: growSize // 2]

This last solution will preserve the current policy for growth. It
also avoids hardcoding any limit for maximum growth.

Kind regards
Runar Jordahl
blog.epigent.com

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Allowing Larger Collections by Reducing Growth Size

Runar Jordahl
My last mail missed the last statement in the method
SequenceableCollection>>growToAtLeast:growSize: . Below is the full
source code:

        "Attempt to grow to size of anInteger + growSize. If resulting
collection is too large for implementation limits, reduce growSize and
retry."

        | maximumSize attemptedNewSize |

        anInteger <= self size ifTrue: [^self].

        maximumSize := ObjectMemory is64Bit
                ifTrue: [72057594037927936 "2^56"]
                ifFalse: [268435456 "2^28"].
        attemptedNewSize := anInteger + growSize.
        (attemptedNewSize > maximumSize and: [growSize > 0])
                ifTrue: [self growToAtLeast: anInteger growSize: growSize // 2]
                ifFalse: [self changeSizeTo: attemptedNewSize]
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Allowing Larger Collections by Reducing Growth Size

Steven Kelly
At the end, growSize // 2 seems to result in rather a lot of recursive
calls and also unnecessary very expensive #grows. E.g. for the last 1023
bytes of possible growth we have:

2^29 requested -> 2^28 -> 2^27... 2^10 -> 2^9 -> success!
(512 bytes added)
2^29 requested -> 2^28 -> 2^27... 2^9 -> 2^8 -> success!
...
2^29 requested -> 2^28 -> 2^27... 2^2 -> 2^1 -> success!

so 20+21+22+...29 calls = 255 calls and 9 #grows for 1023 bytes of
growth

Not a big problem, but could be avoided by shortcutting to grow straight
to maximumSize once growSize is less than some sensible number: at least
1024, but probably bigger - more to avoid having to do lots of expensive
#grows than this recursion.

Cheers,
Steve

Runar Jordahl wrote:

> For some scenarios, Collection>>growSize is too aggressive when
> increasing the collection's size: It doubles the size for each growth.
> This has the consequence that collections with an element size >= 2^27
> (128 MB) cannot grow.
>
> "Attempt to grow to size of anInteger + growSize. If
> resulting collection is too large for implementation limits,
> reduce growSize and retry."
>
> | maximumSize attemptedNewSize |
>
> anInteger <= self size ifTrue: [^self].
>
> maximumSize := ObjectMemory is64Bit
> ifTrue: [72057594037927936 "2^56"]
> ifFalse: [268435456 "2^28"].
> attemptedNewSize := anInteger + growSize.
> (attemptedNewSize > maximumSize and: [growSize > 0])
> ifTrue: [self growToAtLeast: anInteger growSize:
> growSize // 2]
> ifFalse: [self changeSizeTo: attemptedNewSize]
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Allowing Larger Collections by Reducing Growth Size

Martin McClure-3
On 09/24/2012 02:08 AM, Steven Kelly wrote:

> At the end, growSize // 2 seems to result in rather a lot of recursive
> calls and also unnecessary very expensive #grows. E.g. for the last 1023
> bytes of possible growth we have:
>
> 2^29 requested -> 2^28 -> 2^27... 2^10 -> 2^9 -> success!
> (512 bytes added)
> 2^29 requested -> 2^28 -> 2^27... 2^9 -> 2^8 -> success!
> ...
> 2^29 requested -> 2^28 -> 2^27... 2^2 -> 2^1 -> success!
>
> so 20+21+22+...29 calls = 255 calls and 9 #grows for 1023 bytes of
> growth
>
> Not a big problem, but could be avoided by shortcutting to grow straight
> to maximumSize once growSize is less than some sensible number: at least
> 1024, but probably bigger - more to avoid having to do lots of expensive
> #grows than this recursion.

Seems simpler to just jump straight to maximumSize (actually, to a
suitable prime number very close to maximumSize -- you don't actually
want a table size that's a power of two) once a doubling would go past
maximumSize.

Regards,

-Martin
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc