I was wondering if someone could explain how memory is allocated and consumed in Gemstone for complex object graphs. I am trying to decide if I should use GLASS for an upcoming project, but I have some concerns about if it is appropriate and how I can solve some architectural issues.
Namely, I have very deep or interconnected object graphs. I am concerned that when I bring these objects back, even parts of the graph I am not currently using will be dereferenced or at least retrieved from the Stone in some situations. For example, if I simplify - Customer -> Orders -> Line Items. If I have an object in Smalltalk that follows this structure and I retrieve the customer but do not send any messages to the orders or line items collection, will they be allocated in memory or retrieved? I guess what I am getting at is does Gemstone allocate memory actively or lazily? If the memory is always allocated, I see this as a problem in my architecture as there can be very deep object graphs, recursive relationships, and so on. Imagine an actual graph for example (Vertexes and Edges) - I don't want the entire graph retrieved by accident and allocated into memory as it could include gigs of objects. Is there a good way to solve this potentially without sacrificing transparent persistence? Maybe some kind of light weight container or manual lazy approach? Up until now, I've been building simple models, so I haven't been as concerned. Thanks. |
On 11/01/2010 09:18 AM, YossiDM wrote:
> > I was wondering if someone could explain how memory is allocated and consumed > in Gemstone for complex object graphs. I am trying to decide if I should use > GLASS for an upcoming project, but I have some concerns about if it is > appropriate and how I can solve some architectural issues. > > Namely, I have very deep or interconnected object graphs. I am concerned > that when I bring these objects back, even parts of the graph I am not > currently using will be dereferenced or at least retrieved from the Stone in > some situations. > > For example, if I simplify - Customer -> Orders -> Line Items. If I have an > object in Smalltalk that follows this structure and I retrieve the customer > but do not send any messages to the orders or line items collection, will > they be allocated in memory or retrieved? I guess what I am getting at is > does Gemstone allocate memory actively or lazily? If the memory is always > allocated, I see this as a problem in my architecture as there can be very > deep object graphs, recursive relationships, and so on. Imagine an actual > graph for example (Vertexes and Edges) - I don't want the entire graph > retrieved by accident and allocated into memory as it could include gigs of > objects. > > Is there a good way to solve this potentially without sacrificing > transparent persistence? Maybe some kind of light weight container or manual > lazy approach? Up until now, I've been building simple models, so I haven't > been as concerned. Thanks. Yossi, The short answer is that you don't need to worry about large interconnected object graphs. GemStone uses an object table for managing objects, which basically means that you can load an object into vm memory without dragging in the whole object structure. If you reference an instance variable of an object, the "body" of that object is brought in without dragging in all of it's instance variables ... and so on. In the vm itself, the OOPs are mapped to direct pointers, so that traversing a graph of objects already loaded into vm memory is very efficient. You also don't need to worry about how big Collections like Array, OrderedCollection, and Dictionary grow, because we break large objects (those greater than about 2000 slots) into chunks so that only the segment of the object the large object that you are touching is brought into memory. Of course if you are going to traverse a very large object you will ultimately load the whole object (or object graph) into memory. We have indexed collections that allow you to do efficient queries on large collections, so that you don't have to traverse the large collection to find elements that match certain criterion. All of this means that only your working set of objects are ever loaded into ... Dale |
Thanks Dale!
My main concern was graph traversals. I planned to use indexes and streams as appropriate. Have any GemStone customers every used GemStone as a database with some graph like semantics or graph traversals? My thought was perhaps using a stream or some sort of other data structure to ensure that as I traverse, unneeded parts of the result set drop out of memory immediately. I have some concern here about overloading the SPC if I don't do traversals correctly. I hope I am not using a bad word for you guys here, but I noticed that Objectivity was building a product on top of their offering called InfiniteGraph, so I got the idea that I could build something similar to suit my needs on top of GLASS. They only support Java for now, and I am not very happy with the Smalltalk support in their core product. I'd be happy to discuss some of those ideas further with you if interested over private channels. Thanks again for your quick feedback, always appreciated along with your great work. |
On 11/01/2010 11:23 AM, YossiDM wrote:
> > Thanks Dale! > > My main concern was graph traversals. I planned to use indexes and streams > as appropriate. Have any GemStone customers every used GemStone as a > database with some graph like semantics or graph traversals? My thought was > perhaps using a stream or some sort of other data structure to ensure that > as I traverse, unneeded parts of the result set drop out of memory > immediately. I have some concern here about overloading the SPC if I don't > do traversals correctly. > > I hope I am not using a bad word for you guys here, but I noticed that > Objectivity was building a product on top of their offering called > InfiniteGraph, so I got the idea that I could build something similar to > suit my needs on top of GLASS. They only support Java for now, and I am not > very happy with the Smalltalk support in their core product. > > I'd be happy to discuss some of those ideas further with you if interested > over private channels. Thanks again for your quick feedback, always > appreciated along with your great work. Yossi, I'm not sure that specialized data structures are required for graph traversal ... Both the SPC and the vm are pretty good about dropping out the least recently used objects/pages so I wouldn't worry about that part specifically ... When you mention result sets, though, you will need to manage the results of your queries from the perspective that if the result set is large you'll need to consider committing on a regular basis to avoid running out of temporary object space... The same sort of behavior that I mentioned before applies to result sets ... if the object is added to a result set, the object can drop out of memory (if it hasn't been modified) and the cost in the result set is a slot that references the oop of the object ... Dirty objects are kept in vm memory, so you can run out of vm temp obj space if you are modifying a lot of objects or building a giant collection in memory ... the solution is to commit on a regular basis as you create the result set ... With all of that said, there is nothing wrong with doing clever traversals of you very large object graph, to minimize that number of nodes that you need to hit ... Dale |
Thanks Dale. That sounds reasonable. What would be the best way to go about checking during a traversal? Is it reasonable to check the amount of room in the SPC, or are any of those operations expensive? I can imagine this strategy may not make a lot of sense as well if more than one machine is adding to the memory at the same time. Same thing goes for the VM if more than on thread is allocating memory.
I would prefer to avoid exceptions, but if they do happen, I am assuming just catch the out of memory exception and go from there. I suppose I could as a rule of thumb just commit under some criteria, primitive like every x iterations or more advanced by doing it some other way. I'd of course try to just use streams and indexes first, but I don't think this is always 100% possible, especially since we have some ad-hoc field creation in the mix. Thanks. |
On 11/01/2010 10:19 PM, YossiDM wrote:
> > Thanks Dale. That sounds reasonable. What would be the best way to go about > checking during a traversal? Is it reasonable to check the amount of room in > the SPC, or are any of those operations expensive? I can imagine this > strategy may not make a lot of sense as well if more than one machine is > adding to the memory at the same time. Same thing goes for the VM if more > than on thread is allocating memory. > > I would prefer to avoid exceptions, but if they do happen, I am assuming > just catch the out of memory exception and go from there. I suppose I could > as a rule of thumb just commit under some criteria, primitive like every x > iterations or more advanced by doing it some other way. I'd of course try to > just use streams and indexes first, but I don't think this is always 100% > possible, especially since we have some ad-hoc field creation in the mix. > Thanks. For building a result set where the memory consumption is predictable and you own the code building the result set, you can arrange to commit every n entries ... if you are building complex result set objects of an "unpredictable size", then you can create an almost out of memory handler, or use the one in MCPlatformSupport. Either way take a look at MCPlatformSupport class>>commitOnAlmostOutOfMemoryDuring:. Dale |
Free forum by Nabble | Edit this page |