Memory Consumption for Complex Object Graphs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory Consumption for Complex Object Graphs

YossiDM
I was wondering if someone could explain how memory is allocated and consumed in Gemstone for complex object graphs. I am trying to decide if I should use GLASS for an upcoming project, but I have some concerns about if it is appropriate and how I can solve some architectural issues.

Namely, I have very deep or interconnected object graphs. I am concerned that when I bring these objects back, even parts of the graph I am not currently using will be dereferenced or at least retrieved from the Stone in some situations.

For example, if I simplify - Customer -> Orders -> Line Items. If I have an object in Smalltalk that follows this structure and I retrieve the customer but do not send any messages to the orders or line items collection, will they be allocated in memory or retrieved? I guess what I am getting at is does Gemstone allocate memory actively or lazily? If the memory is always allocated, I see this as a problem in my architecture as there can be very deep object graphs, recursive relationships, and so on. Imagine an actual graph for example (Vertexes and Edges) - I don't want the entire graph retrieved by accident and allocated into memory as it could include gigs of objects.

Is there a good way to solve this potentially without sacrificing transparent persistence? Maybe some kind of light weight container or manual lazy approach? Up until now, I've been building simple models, so I haven't been as concerned. Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Memory Consumption for Complex Object Graphs

Dale Henrichs
On 11/01/2010 09:18 AM, YossiDM wrote:

>
> I was wondering if someone could explain how memory is allocated and consumed
> in Gemstone for complex object graphs. I am trying to decide if I should use
> GLASS for an upcoming project, but I have some concerns about if it is
> appropriate and how I can solve some architectural issues.
>
> Namely, I have very deep or interconnected object graphs. I am concerned
> that when I bring these objects back, even parts of the graph I am not
> currently using will be dereferenced or at least retrieved from the Stone in
> some situations.
>
> For example, if I simplify - Customer ->  Orders ->  Line Items. If I have an
> object in Smalltalk that follows this structure and I retrieve the customer
> but do not send any messages to the orders or line items collection, will
> they be allocated in memory or retrieved? I guess what I am getting at is
> does Gemstone allocate memory actively or lazily? If the memory is always
> allocated, I see this as a problem in my architecture as there can be very
> deep object graphs, recursive relationships, and so on. Imagine an actual
> graph for example (Vertexes and Edges) - I don't want the entire graph
> retrieved by accident and allocated into memory as it could include gigs of
> objects.
>
> Is there a good way to solve this potentially without sacrificing
> transparent persistence? Maybe some kind of light weight container or manual
> lazy approach? Up until now, I've been building simple models, so I haven't
> been as concerned. Thanks.

Yossi,

The short answer is that you don't need to worry about large
interconnected object graphs.

GemStone uses an object table for managing objects, which basically
means that you can load an object into vm memory without dragging in the
whole object structure. If you reference an instance variable of an
object, the "body" of that object is brought in without dragging in all
of it's instance variables ... and so on.

In the vm itself, the OOPs are mapped to direct pointers, so that
traversing a graph of objects already loaded into vm memory is very
efficient.

You also don't need to worry about how big Collections like Array,
OrderedCollection, and Dictionary grow, because we break large objects
(those greater than about 2000 slots) into chunks so that only the
segment of the object the large object that you are touching is brought
into memory.

Of course if you are going to traverse a very large object you will
ultimately load the whole object (or object graph) into memory. We have
indexed collections that allow you to do efficient queries on large
collections, so that you don't have to traverse the large collection to
find elements that match certain criterion.

All of this means that only your working set of objects are ever loaded
into ...

Dale


Reply | Threaded
Open this post in threaded view
|

Re: Memory Consumption for Complex Object Graphs

YossiDM
Thanks Dale!

My main concern was graph traversals. I planned to use indexes and streams as appropriate. Have any GemStone customers every used GemStone as a database with some graph like semantics or graph traversals? My thought was perhaps using a stream or some sort of other data structure to ensure that as I traverse, unneeded parts of the result set drop out of memory immediately. I have some concern here about overloading the SPC if I don't do traversals correctly.

I hope I am not using a bad word for you guys here, but I noticed that Objectivity was building a product on top of their offering called InfiniteGraph, so I got the idea that I could build something similar to suit my needs on top of GLASS. They only support Java for now, and I am not very happy with the Smalltalk support in their core product.

I'd be happy to discuss some of those ideas further with you if interested over private channels. Thanks again for your quick feedback, always appreciated along with your great work.
Reply | Threaded
Open this post in threaded view
|

Re: Memory Consumption for Complex Object Graphs

Dale Henrichs
On 11/01/2010 11:23 AM, YossiDM wrote:

>
> Thanks Dale!
>
> My main concern was graph traversals. I planned to use indexes and streams
> as appropriate. Have any GemStone customers every used GemStone as a
> database with some graph like semantics or graph traversals? My thought was
> perhaps using a stream or some sort of other data structure to ensure that
> as I traverse, unneeded parts of the result set drop out of memory
> immediately. I have some concern here about overloading the SPC if I don't
> do traversals correctly.
>
> I hope I am not using a bad word for you guys here, but I noticed that
> Objectivity was building a product on top of their offering called
> InfiniteGraph, so I got the idea that I could build something similar to
> suit my needs on top of GLASS. They only support Java for now, and I am not
> very happy with the Smalltalk support in their core product.
>
> I'd be happy to discuss some of those ideas further with you if interested
> over private channels. Thanks again for your quick feedback, always
> appreciated along with your great work.

Yossi,

I'm not sure that specialized data structures are required for graph
traversal ... Both the SPC and the vm are pretty good about dropping out
the least recently used objects/pages so I wouldn't worry about that
part specifically ...

When you mention result sets, though, you will need to manage the
results of your queries from the perspective that if the result set is
large you'll need to consider committing on a regular basis to avoid
running out of temporary object space... The same sort of behavior that
I mentioned before applies to result sets ... if the object is added to
a result set, the object can drop out of memory (if it hasn't been
modified) and the cost in the result set is a slot that references the
oop of the object ...

Dirty objects are kept in vm memory, so you can run out of vm temp obj
space if you are modifying a lot of objects or building a giant
collection in memory ... the solution is to commit on a regular basis as
you create the result set ...

With all of that said, there is nothing wrong with doing clever
traversals of you very large object graph, to minimize that number of
nodes that you need to hit ...

Dale
Reply | Threaded
Open this post in threaded view
|

Re: Memory Consumption for Complex Object Graphs

YossiDM
Thanks Dale. That sounds reasonable.  What would be the best way to go about checking during a traversal? Is it reasonable to check the amount of room in the SPC, or are any of those operations expensive? I can imagine this strategy may not make a lot of sense as well if more than one machine is adding to the memory at the same time. Same thing goes for the VM if more than on thread is allocating memory.

I would prefer to avoid exceptions, but if they do happen, I am assuming just catch the out of memory exception and go from there.  I suppose I could as a rule of thumb just commit under some criteria, primitive like every x iterations or more advanced by doing it some other way. I'd of course try to just use streams and indexes first, but I don't think this is always 100% possible, especially since we have some ad-hoc field creation in the mix. Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Memory Consumption for Complex Object Graphs

Dale Henrichs
On 11/01/2010 10:19 PM, YossiDM wrote:

>
> Thanks Dale. That sounds reasonable.  What would be the best way to go about
> checking during a traversal? Is it reasonable to check the amount of room in
> the SPC, or are any of those operations expensive? I can imagine this
> strategy may not make a lot of sense as well if more than one machine is
> adding to the memory at the same time. Same thing goes for the VM if more
> than on thread is allocating memory.
>
> I would prefer to avoid exceptions, but if they do happen, I am assuming
> just catch the out of memory exception and go from there.  I suppose I could
> as a rule of thumb just commit under some criteria, primitive like every x
> iterations or more advanced by doing it some other way. I'd of course try to
> just use streams and indexes first, but I don't think this is always 100%
> possible, especially since we have some ad-hoc field creation in the mix.
> Thanks.

For building a result set where the memory consumption is predictable
and you own the code building the result set, you can arrange to commit
every n entries ... if you are building complex result set objects of an
"unpredictable size", then you can create an almost out of memory
handler, or use the one in MCPlatformSupport. Either way take a look at
MCPlatformSupport class>>commitOnAlmostOutOfMemoryDuring:.

Dale