Smalltalk › Gemtalk › GLASS

large collections

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

otto

large collections

Hi,

We've been using OrderedCollections to manage instances where we need
to persist them in our system. Some of these collections are getting
bigger now; I found one that has almost 200000 elements in it.

My impression is that OrderedCollection does not handle large numbers
of elements well. Is this correct? Should we be using another class?
What are the thresholds one should be considering when using
collections?

>From previous versions of GemStone that I've worked with, IdentitySet
was quite efficient. Is that still recommended for large collections?

Thanks
Otto

Dale Henrichs

Re: large collections

Otto,

>From a storage perspective OrderedCollections are not a bad choice for large collections from a storage perspective.

If you are doing linear scans of all of the elements in the collection or accessing them by index then the OrderedCollection is just fine as well.

GemStone OrderedCollections grow without copying so there is no disadvantage there either.

The IdentitySet stores elements by oop, so you can gain a significant performance advantage for doing #includes: type operations, but there is not a big difference between OrderedCollection and IdentitySet when it comes to linear scans and disk footprint.

SO I would say the decision to move from OrderedCollection will be based on your access patterns ...

Dale

----- Original Message -----
| From: "Otto Behrens" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Thursday, July 26, 2012 2:51:10 AM
| Subject: [GS/SS Beta] large collections
|
| Hi,
|
| We've been using OrderedCollections to manage instances where we need
| to persist them in our system. Some of these collections are getting
| bigger now; I found one that has almost 200000 elements in it.
|
| My impression is that OrderedCollection does not handle large numbers
| of elements well. Is this correct? Should we be using another class?
| What are the thresholds one should be considering when using
| collections?
|
| >From previous versions of GemStone that I've worked with,
| >IdentitySet
| was quite efficient. Is that still recommended for large collections?
|
| Thanks
| Otto
|

otto

Re: large collections

Thanks.

> The IdentitySet stores elements by oop, so you can gain a significant performance advantage for doing #includes: type operations, but there is not a big difference between OrderedCollection and IdentitySet when it comes to linear scans and disk footprint.

Yes, includesIdentity: did seem to be slow.

I think that (not well written) tests is our biggest issue where we
assume linear addition. But that's fixable.

Is add: similar in performance between IdentitySet and OrderedCollection?

Dale Henrichs

Re: large collections

Otto,

#add: should be comparable ...

For an OrderedCollection the #add: tacks on the element at the end, so no copying is involved (periodic adding of new page).

For an IdentitySet btree insertions do result in some copying as the btree structures are adjusted to accommodate the new element, but the heavy lifting is all done in C (using memory-based operations) so you might not be able to measure the difference:)

Dale

----- Original Message -----
| From: "Otto Behrens" <[hidden email]>
| To: "GemStone Seaside beta discussion" <[hidden email]>
| Sent: Thursday, July 26, 2012 10:11:47 AM
| Subject: Re: [GS/SS Beta] large collections
|
| Thanks.
|
| > The IdentitySet stores elements by oop, so you can gain a
| > significant performance advantage for doing #includes: type
| > operations, but there is not a big difference between
| > OrderedCollection and IdentitySet when it comes to linear scans
| > and disk footprint.
|
| Yes, includesIdentity: did seem to be slow.
|
| I think that (not well written) tests is our biggest issue where we
| assume linear addition. But that's fixable.
|
| Is add: similar in performance between IdentitySet and
| OrderedCollection?
|

otto

Re: large collections

Thanks!

On Thu, Jul 26, 2012 at 7:20 PM, Dale Henrichs <[hidden email]> wrote:

> Otto,
>
> #add: should be comparable ...
>
> For an OrderedCollection the #add: tacks on the element at the end, so no copying is involved (periodic adding of new page).
>
> For an IdentitySet btree insertions do result in some copying as the btree structures are adjusted to accommodate the new element, but the heavy lifting is all done in C (using memory-based operations) so you might not be able to measure the difference:)
>
> Dale
>
> ----- Original Message -----
> | From: "Otto Behrens" <[hidden email]>
> | To: "GemStone Seaside beta discussion" <[hidden email]>
> | Sent: Thursday, July 26, 2012 10:11:47 AM
> | Subject: Re: [GS/SS Beta] large collections
> |
> | Thanks.
> |
> | > The IdentitySet stores elements by oop, so you can gain a
> | > significant performance advantage for doing #includes: type
> | > operations, but there is not a big difference between
> | > OrderedCollection and IdentitySet when it comes to linear scans
> | > and disk footprint.
> |
> | Yes, includesIdentity: did seem to be slow.
> |
> | I think that (not well written) tests is our biggest issue where we
> | assume linear addition. But that's fixable.
> |
> | Is add: similar in performance between IdentitySet and
> | OrderedCollection?
> |