Hello Moose ! With the current memory limit of Pharo, and the size of the generated moose models being potentially huge, maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”, in order to only have part of a model in memory when working on a specific area. If so, I would be really interested to hear about (or play with) it :) At first look, I see that there is a MooseGroupStorage class. This kind of object answers to some usual collection messages (add, remove, select, detect, .. ). I guess that when we perform queries over a moose model, when we add or remove entity objects, we end up using this protocol. So, if I wanted to implement a database persistence solution for moose, my first feeling would be to implement a specific kind of “MooseGroupStorage”, and to plug there a communication layer with a database. Does it make sense ? I have not played with moose since a long time (but I am back to play with it a lot more :)) and my vision on things may be naive. So do not hesitate to tell me if what I am saying sounds crazy, and to push me back on the right path ! Does anyone already thought about solutions to deal with memory limits when generating big moose models ? Cyrille Delaunay _______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
Hi Cyrille,
Long time no see! On 30/03/17 10:07, Cyrille Delaunay wrote: > With the current memory limit of Pharo > and the size of the generated moose models being potentially huge, > > maybe some of you already though about (or even experimented) persistence > solutions with query mechanisms that would instantiate famix objects > only “on demand”, > > in order to only have part of a model in memory when working on a > specific area. > > If so, I would be really interested to hear about (or play with) it :) The inheritance based modeling results in very large, nearly empty objects. Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad match, especially if you cannot push querying to them. We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed. As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines. Stephan _______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
Hi stephan, thanks for your thoughts (further comments below) On 30/03/2017 13:31, Stephan Eggermont wrote: > Hi Cyrille, > Long time no see! > > On 30/03/17 10:07, Cyrille Delaunay wrote: >> With the current memory limit of Pharo >> and the size of the generated moose models being potentially huge, >> >> maybe some of you already though about (or even experimented) >> persistence >> solutions with query mechanisms that would instantiate famix objects >> only “on demand”, >> >> in order to only have part of a model in memory when working on a >> specific area. >> >> If so, I would be really interested to hear about (or play with) it :) > The current FAMIX based models are not suitable for large models. > The inheritance based modeling results in very large, nearly empty > objects. > > Moose models tend to be highly connected and tend to be used using badly > predictable access patterns. That makes "standard databases" a bad match, > especially if you cannot push querying to them. > > We are very close to having 64bit Moose everywhere, shifting the > problem from > size of the model directly to speed. for windows yet The problem is that Synectique is already having difficulties right now and is looking for shorter term solution(s) > As the VM uses only one native thread and > 8-thread machines are everywhere, the best speed-up should be expected > from > splitting the model over multiple pharo images, and possibly over > multiple machines. > interesting idea, I am having some difficult seeing how to split a model in several parts that would have to link somehow one to the other. Do you have any further thoughts on this point? nicolas -- Nicolas Anquetil -- MCF (HDR) Project-Team RMod _______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil <[hidden email]> wrote:
how do they link
_______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
On 30/03/2017 16:39, Kjell Godo wrote:
well a model is a big graph where all entities (transitively) relate to all other entities, so splitting the model over several pharo images implies having entities in one image referencing other entities in other images. Not at all impossible, but this would be an interesting problem of engineering nicolas
-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod _______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
2017-03-30 16:54 GMT+02:00 Nicolas Anquetil <[hidden email]>:
With Onil Goubier, we tried to publish a paper describing that mechanism in Smalltalk in 1998, where the mechanism to establish links between images was unified with the one storing the objects on disk. It was rejected, but the reviews were encouraging. The main engineering difficulty we saw back then was GC-ing over that thing. Regards, Thierry
_______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
On 30/03/17 17:02, Thierry Goubier wrote: > With Onil Goubier, we tried to publish a paper describing that > mechanism in Smalltalk in 1998, where the mechanism to establish > links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing. Is that paper available somewhere? Stephan _______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
+1 On 30/03/2017 17:06, Stephan Eggermont
wrote:
-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod _______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
In reply to this post by Stephan Eggermont-3
2017-03-30 17:06 GMT+02:00 Stephan Eggermont <[hidden email]>:
I suspect I may have a backup of that on a Sun MD drive I haven't been able to read since at least mid-1998 :( So the answer is no. But the core idea was simple: use proxy objects, and when you touch the proxy, either loads the object from disk or forward it a message over the network. Kind of what you would do on a distributed virtual shared memory implementation combined with persistent storage. Use a page-based mechanism for loading / unloading objects so as to reduce costs. There is a guy in my lab working on DVSM; maybe that would be an interesting subject. Thierry
_______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
In reply to this post by Nicolas Anquetil
On 30/03/17 16:15, Nicolas Anquetil wrote:
> "very close" seems a bit optimistic. For example, it will take some > time for windows yet The problem is that Synectique is already > having difficulties right now and is looking for shorter term > solution(s) Short term would mean run a 64-bit linux in a vm or with a remote desktop. >> As the VM uses only one native thread and 8-thread machines are >> everywhere, the best speed-up should be expected from splitting >> the model over multiple pharo images, and possibly over multiple >> machines. >> > interesting idea, I am having some difficult seeing how to split a > model in several parts that would have to link somehow one to the > other. Do you have any further thoughts on this point? Splitting a model is indeed the interesting aspect. Either do it automatic based on usage, or use a heuristic. The navigation can be made distribution-aware to avoid doing only network-calls. Easiest is to make a hierarchical model that fits well with the subject, e.g. package-based. So everything inside the package is guaranteed to be in the image for some set of packages, and everything else is remote pointer. If you have enough images, you can have different combinations of packages in different images, and some mechanism to determine if you received a full answer yet. Stephan _______________________________________________ Moose-dev mailing list [hidden email] https://www.list.inf.unibe.ch/listinfo/moose-dev |
Free forum by Nabble | Edit this page |