Smalltalk › Frameworks & Tools › Moose

Big models, memory and persistence

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

10 messages Options

cdelaunay

Big models, memory and persistence

Hello Moose !

With the current memory limit of Pharo,

and the size of the generated moose models being potentially huge,

maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,

in order to only have part of a model in memory when working on a specific area.

If so, I would be really interested to hear about (or play with) it :)

At first look, I see that there is a MooseGroupStorage class.

This kind of object answers to some usual collection messages (add, remove, select, detect, .. ).

I guess that when we perform queries over a moose model,

when we add or remove entity objects,

we end up using this protocol.

So, if I wanted to implement a database persistence solution for moose,

my first feeling would be to implement a specific kind of “MooseGroupStorage”,

and to plug there a communication layer with a database.

Does it make sense ?

I have not played with moose since a long time

(but I am back to play with it a lot more :))

and my vision on things may be naive.

So do not hesitate to tell me if what I am saying sounds crazy,

and to push me back on the right path !

Does anyone already thought about solutions to deal with memory limits when generating big moose models ?

Cyrille Delaunay

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Stephan Eggermont-3

Re: Big models, memory and persistence

Hi Cyrille,
Long time no see!

On 30/03/17 10:07, Cyrille Delaunay wrote:

> With the current memory limit of Pharo
> and the size of the generated moose models being potentially huge,
>
> maybe some of you already though about (or even experimented) persistence
> solutions with query mechanisms that would instantiate famix objects
> only “on demand”,
>
> in order to only have part of a model in memory when working on a
> specific area.
>
> If so, I would be really interested to hear about (or play with) it :)

The current FAMIX based models are not suitable for large models.
The inheritance based modeling results in very large, nearly empty
objects.

Moose models tend to be highly connected and tend to be used using badly
predictable access patterns. That makes "standard databases" a bad match,
especially if you cannot push querying to them.

We are very close to having 64bit Moose everywhere, shifting the problem
from
size of the model directly to speed. As the VM uses only one native
thread and
8-thread machines are everywhere, the best speed-up should be expected from
splitting the model over multiple pharo images, and possibly over
multiple machines.

Stephan
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Nicolas Anquetil

Re: Big models, memory and persistence

Hi stephan,

thanks for your thoughts

(further comments below)

On 30/03/2017 13:31, Stephan Eggermont wrote:

> Hi Cyrille,
> Long time no see!
>
> On 30/03/17 10:07, Cyrille Delaunay wrote:
>> With the current memory limit of Pharo
>> and the size of the generated moose models being potentially huge,
>>
>> maybe some of you already though about (or even experimented)
>> persistence
>> solutions with query mechanisms that would instantiate famix objects
>> only “on demand”,
>>
>> in order to only have part of a model in memory when working on a
>> specific area.
>>
>> If so, I would be really interested to hear about (or play with) it :)
> The current FAMIX based models are not suitable for large models.
> The inheritance based modeling results in very large, nearly empty
> objects.
>
> Moose models tend to be highly connected and tend to be used using badly
> predictable access patterns. That makes "standard databases" a bad match,
> especially if you cannot push querying to them.
>
> We are very close to having 64bit Moose everywhere, shifting the
> problem from
> size of the model directly to speed.

"very close" seems a bit optimistic. For example, it will take some time
for windows yet
The problem is that Synectique is already having difficulties right now
and is looking for shorter term solution(s)

> As the VM uses only one native thread and
> 8-thread machines are everywhere, the best speed-up should be expected
> from
> splitting the model over multiple pharo images, and possibly over
> multiple machines.
>
interesting idea,
I am having some difficult seeing how to split a model in several parts
that would have to link somehow one to the other.
Do you have any further thoughts on this point?

nicolas

--
Nicolas Anquetil -- MCF (HDR)
Project-Team RMod

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Kjell Godo

Re: Big models, memory and persistence

On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil <[hidden email]> wrote:

Hi stephan,

thanks for your thoughts

(further comments below)

On 30/03/2017 13:31, Stephan Eggermont wrote:
> Hi Cyrille,
> Long time no see!
>
> On 30/03/17 10:07, Cyrille Delaunay wrote:
>> With the current memory limit of Pharo
>> and the size of the generated moose models being potentially huge,
>>
>> maybe some of you already though about (or even experimented)
>> persistence
>> solutions with query mechanisms that would instantiate famix objects
>> only “on demand”,
>>
>> in order to only have part of a model in memory when working on a
>> specific area.
>>
>> If so, I would be really interested to hear about (or play with) it :)
> The current FAMIX based models are not suitable for large models.
> The inheritance based modeling results in very large, nearly empty
> objects.
>
> Moose models tend to be highly connected and tend to be used using badly
> predictable access patterns. That makes "standard databases" a bad match,
> especially if you cannot push querying to them.
>
> We are very close to having 64bit Moose everywhere, shifting the
> problem from
> size of the model directly to speed.
"very close" seems a bit optimistic. For example, it will take some time
for windows yet
The problem is that Synectique is already having difficulties right now
and is looking for shorter term solution(s)

> As the VM uses only one native thread and
> 8-thread machines are everywhere, the best speed-up should be expected
> from
> splitting the model over multiple pharo images, and possibly over
> multiple machines.
>
interesting idea,
I am having some difficult seeing how to split a model in several parts
that would have to link somehow one to the other.

how do they link

Do you have any further thoughts on this point?

nicolas

--
Nicolas Anquetil -- MCF (HDR)
Project-Team RMod

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Nicolas Anquetil

Re: Big models, memory and persistence

On 30/03/2017 16:39, Kjell Godo wrote:

On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil <[hidden email]> wrote:

Hi stephan,

thanks for your thoughts

(further comments below)

On 30/03/2017 13:31, Stephan Eggermont wrote:
> Hi Cyrille,
> Long time no see!
>
> On 30/03/17 10:07, Cyrille Delaunay wrote:
>> With the current memory limit of Pharo
>> and the size of the generated moose models being potentially huge,
>>
>> maybe some of you already though about (or even experimented)
>> persistence
>> solutions with query mechanisms that would instantiate famix objects
>> only “on demand”,
>>
>> in order to only have part of a model in memory when working on a
>> specific area.
>>
>> If so, I would be really interested to hear about (or play with) it :)
> The current FAMIX based models are not suitable for large models.
> The inheritance based modeling results in very large, nearly empty
> objects.
>
> Moose models tend to be highly connected and tend to be used using badly
> predictable access patterns. That makes "standard databases" a bad match,
> especially if you cannot push querying to them.
>
> We are very close to having 64bit Moose everywhere, shifting the
> problem from
> size of the model directly to speed.
"very close" seems a bit optimistic. For example, it will take some time
for windows yet
The problem is that Synectique is already having difficulties right now
and is looking for shorter term solution(s)

> As the VM uses only one native thread and
> 8-thread machines are everywhere, the best speed-up should be expected
> from
> splitting the model over multiple pharo images, and possibly over
> multiple machines.
>
interesting idea,
I am having some difficult seeing how to split a model in several parts
that would have to link somehow one to the other.

how do they link

well a model is a big graph where all entities (transitively) relate to all other entities, so splitting the model over several pharo images implies having entities in one image referencing other entities in other images.
Not at all impossible, but this would be an interesting problem of engineering

nicolas

Do you have any further thoughts on this point?

nicolas

-- 
Nicolas Anquetil -- MCF (HDR)
Project-Team RMod

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Thierry Goubier

Re: Big models, memory and persistence

2017-03-30 16:54 GMT+02:00 Nicolas Anquetil <[hidden email]>:

On 30/03/2017 16:39, Kjell Godo wrote:

On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil <[hidden email]> wrote:

Hi stephan,

thanks for your thoughts

(further comments below)

On 30/03/2017 13:31, Stephan Eggermont wrote:
> Hi Cyrille,
> Long time no see!
>
> On 30/03/17 10:07, Cyrille Delaunay wrote:
>> With the current memory limit of Pharo
>> and the size of the generated moose models being potentially huge,
>>
>> maybe some of you already though about (or even experimented)
>> persistence
>> solutions with query mechanisms that would instantiate famix objects
>> only “on demand”,
>>
>> in order to only have part of a model in memory when working on a
>> specific area.
>>
>> If so, I would be really interested to hear about (or play with) it :)
> The current FAMIX based models are not suitable for large models.
> The inheritance based modeling results in very large, nearly empty
> objects.
>
> Moose models tend to be highly connected and tend to be used using badly
> predictable access patterns. That makes "standard databases" a bad match,
> especially if you cannot push querying to them.
>
> We are very close to having 64bit Moose everywhere, shifting the
> problem from
> size of the model directly to speed.
"very close" seems a bit optimistic. For example, it will take some time
for windows yet
The problem is that Synectique is already having difficulties right now
and is looking for shorter term solution(s)

> As the VM uses only one native thread and
> 8-thread machines are everywhere, the best speed-up should be expected
> from
> splitting the model over multiple pharo images, and possibly over
> multiple machines.
>
interesting idea,
I am having some difficult seeing how to split a model in several parts
that would have to link somehow one to the other.

how do they link

well a model is a big graph where all entities (transitively) relate to all other entities, so splitting the model over several pharo images implies having entities in one image referencing other entities in other images.
Not at all impossible, but this would be an interesting problem of engineering

With Onil Goubier, we tried to publish a paper describing that mechanism in Smalltalk in 1998, where the mechanism to establish links between images was unified with the one storing the objects on disk. It was rejected, but the reviews were encouraging.

The main engineering difficulty we saw back then was GC-ing over that thing.

Regards,

Thierry

nicolas

Do you have any further thoughts on this point?

nicolas

-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Stephan Eggermont-3

Re: Big models, memory and persistence

On 30/03/17 17:02, Thierry Goubier wrote:
> With Onil Goubier, we tried to publish a paper describing that > mechanism in Smalltalk in 1998, where the mechanism to establish > links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing.
Is that paper available somewhere?

Stephan

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Nicolas Anquetil

Re: Big models, memory and persistence

On 30/03/2017 17:06, Stephan Eggermont wrote:

On 30/03/17 17:02, Thierry Goubier wrote:
> With Onil Goubier, we tried to publish a paper describing that > mechanism in Smalltalk in 1998, where the mechanism to establish > links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing.
Is that paper available somewhere?

Stephan
_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

-- 
Nicolas Anquetil -- MCF (HDR)
Project-Team RMod

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Thierry Goubier

Re: Big models, memory and persistence

In reply to this post by Stephan Eggermont-3

2017-03-30 17:06 GMT+02:00 Stephan Eggermont <[hidden email]>:

On 30/03/17 17:02, Thierry Goubier wrote:
> With Onil Goubier, we tried to publish a paper describing that > mechanism in Smalltalk in 1998, where the mechanism to establish > links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing.
Is that paper available somewhere?

I suspect I may have a backup of that on a Sun MD drive I haven't been able to read since at least mid-1998 :( So the answer is no.

But the core idea was simple: use proxy objects, and when you touch the proxy, either loads the object from disk or forward it a message over the network. Kind of what you would do on a distributed virtual shared memory implementation combined with persistent storage. Use a page-based mechanism for loading / unloading objects so as to reduce costs.

There is a guy in my lab working on DVSM; maybe that would be an interesting subject.

Thierry

Stephan

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev

Stephan Eggermont-3

Re: Big models, memory and persistence

In reply to this post by Nicolas Anquetil

On 30/03/17 16:15, Nicolas Anquetil wrote:
> "very close" seems a bit optimistic. For example, it will take some > time for windows yet The problem is that Synectique is already > having difficulties right now and is looking for shorter term > solution(s)

Short term would mean run a 64-bit linux in a vm or with a remote desktop.

>> As the VM uses only one native thread and 8-thread machines are >> everywhere, the best speed-up should be expected from splitting >> the model over multiple pharo images, and possibly over multiple >> machines. >> > interesting idea, I am having some difficult seeing how to split a > model in several parts that would have to link somehow one to the > other. Do you have any further thoughts on this point?

Splitting a model is indeed the interesting aspect. Either do it automatic
based on usage, or use a heuristic.
The navigation can be made distribution-aware to avoid doing
only network-calls. Easiest is to make a hierarchical model that fits well
with the subject, e.g. package-based. So everything inside the package is
guaranteed to be in the image for some set of packages, and everything
else is remote pointer. If you have enough images, you can have different
combinations of packages in different images, and some mechanism to
determine if you received a full answer yet.

Stephan

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.list.inf.unibe.ch/listinfo/moose-dev