Smalltalk › Frameworks & Tools › Moose

scaling moose

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

9 messages Options

Tudor Girba-2

scaling moose

Hi,

A topic that was raised recently is the scalability issue of Moose. Indeed, it would be great to have again some traction in this direction.

I think there are a couple of possibilities in this area:

- Gemstone. This is the straightforward idea and it would probably match quite well the FAMIX object-oriented model. The realization might not be that easy because we would need to go through all the details of the language independence. Also, we would need a remote UI, or at least a bridge for Glamour. Perhaps the Seaside interface of Glamour, or another Seaside-based one, would be a good direction.

- Graph DBs. http://neo4j.org/ would be an interesting target here. I have no experience in this area, but I would be very interested in collaborating with someone on this.

- Relational DBs. I know that Marco D'Ambros worked on a solution to map FAMIX meta-descriptions to Glorp in VW. And it seemed to work. I think Alberto Bacchelli is or has used it.

- Fuel. Loading with Fuel is about one order of magnitude faster than loading plain MSE: for example for ArgoUML 0.32 I got 108s for the MSE, and 17s for the Fuel based MSE information. So, having the Fuel export next to the MSE file can be a significant improvement. However, this solution only works when you have computations that treat one model at a time (so, it's not so nice if you have to switch between models all the time). I attached a simple script to analyze the difference between plain MSE loading and Fuel load variations.

- Filtered loading. This solution works well when you have the full model, but you do not need all of it. FMMetaRepositoryFilter offers the infrastructure for a filtered loading at the level of FAME: if you want to load a sub part of your model, you construct a sub-meta-model and then let Fame do the rest. This kind of works, but it is only available at the moment in code (look at the tests). We need to move to the next step of integrating this solution and replace the MooseImportingContext and possibly offer a UI as well.

If you want to pick up on any of these topics, let's open a separate thread and see what we can do.

Cheers,
Doru

--
www.tudorgirba.com

"We can create beautiful models in a vacuum.
But, to get them effective we have to deal with the inconvenience of reality."

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

mse-vs-fuel.txt (913 bytes) Download Attachment

Stephan Eggermont-3

Re: scaling moose

Hi Doru,

When thinking about scalability of Moose, what scenarios do you have
in mind? Up to about half a terabyte, you can run out of main memory
on a single machine cost-effectively. The main limitation there is
the lack of a 64-bit vm and image. As far as I understand the access
patterns involved, a main memory based or distributed main memory
solution is far preferable for actually analyzing systems. What do you
hope to achieve by going to disk? When we did the data conversion project,
we thought about partitioning over multiple images but finally managed
with partial loading.

Stephan

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

EstebanLM

Re: scaling moose

In reply to this post by Tudor Girba-2

Don't forget mongodb and riak. Both already have pharo support (especially mongo).

cheers,
Esteban

On Jul 25, 2012, at 8:56 AM, Tudor Girba wrote:

> Hi,
>
> A topic that was raised recently is the scalability issue of Moose. Indeed, it would be great to have again some traction in this direction.
>
> I think there are a couple of possibilities in this area:
>
> - Gemstone. This is the straightforward idea and it would probably match quite well the FAMIX object-oriented model. The realization might not be that easy because we would need to go through all the details of the language independence. Also, we would need a remote UI, or at least a bridge for Glamour. Perhaps the Seaside interface of Glamour, or another Seaside-based one, would be a good direction.
>
> - Graph DBs. http://neo4j.org/ would be an interesting target here. I have no experience in this area, but I would be very interested in collaborating with someone on this.
>
> - Relational DBs. I know that Marco D'Ambros worked on a solution to map FAMIX meta-descriptions to Glorp in VW. And it seemed to work. I think Alberto Bacchelli is or has used it.
>
> - Fuel. Loading with Fuel is about one order of magnitude faster than loading plain MSE: for example for ArgoUML 0.32 I got 108s for the MSE, and 17s for the Fuel based MSE information. So, having the Fuel export next to the MSE file can be a significant improvement. However, this solution only works when you have computations that treat one model at a time (so, it's not so nice if you have to switch between models all the time). I attached a simple script to analyze the difference between plain MSE loading and Fuel load variations.
>
> - Filtered loading. This solution works well when you have the full model, but you do not need all of it. FMMetaRepositoryFilter offers the infrastructure for a filtered loading at the level of FAME: if you want to load a sub part of your model, you construct a sub-meta-model and then let Fame do the rest. This kind of works, but it is only available at the moment in code (look at the tests). We need to move to the next step of integrating this solution and replace the MooseImportingContext and possibly offer a UI as well.
>
>
> If you want to pick up on any of these topics, let's open a separate thread and see what we can do.
>
>
> Cheers,
> Doru
>
> --
> www.tudorgirba.com
>
> "We can create beautiful models in a vacuum.
> But, to get them effective we have to deal with the inconvenience of reality."
>
>
> <mse-vs-fuel.txt>
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Tudor Girba-2

Re: scaling moose

Good point.

Anyone interested in giving any of these a try?

Doru

On Wed, Jul 25, 2012 at 1:04 PM, Esteban Lorenzano <[hidden email]> wrote:

> Don't forget mongodb and riak. Both already have pharo support (especially mongo).
>
> cheers,
> Esteban
>
> On Jul 25, 2012, at 8:56 AM, Tudor Girba wrote:
>
>> Hi,
>>
>> A topic that was raised recently is the scalability issue of Moose. Indeed, it would be great to have again some traction in this direction.
>>
>> I think there are a couple of possibilities in this area:
>>
>> - Gemstone. This is the straightforward idea and it would probably match quite well the FAMIX object-oriented model. The realization might not be that easy because we would need to go through all the details of the language independence. Also, we would need a remote UI, or at least a bridge for Glamour. Perhaps the Seaside interface of Glamour, or another Seaside-based one, would be a good direction.
>>
>> - Graph DBs. http://neo4j.org/ would be an interesting target here. I have no experience in this area, but I would be very interested in collaborating with someone on this.
>>
>> - Relational DBs. I know that Marco D'Ambros worked on a solution to map FAMIX meta-descriptions to Glorp in VW. And it seemed to work. I think Alberto Bacchelli is or has used it.
>>
>> - Fuel. Loading with Fuel is about one order of magnitude faster than loading plain MSE: for example for ArgoUML 0.32 I got 108s for the MSE, and 17s for the Fuel based MSE information. So, having the Fuel export next to the MSE file can be a significant improvement. However, this solution only works when you have computations that treat one model at a time (so, it's not so nice if you have to switch between models all the time). I attached a simple script to analyze the difference between plain MSE loading and Fuel load variations.
>>
>> - Filtered loading. This solution works well when you have the full model, but you do not need all of it. FMMetaRepositoryFilter offers the infrastructure for a filtered loading at the level of FAME: if you want to load a sub part of your model, you construct a sub-meta-model and then let Fame do the rest. This kind of works, but it is only available at the moment in code (look at the tests). We need to move to the next step of integrating this solution and replace the MooseImportingContext and possibly offer a UI as well.
>>
>>
>> If you want to pick up on any of these topics, let's open a separate thread and see what we can do.
>>
>>
>> Cheers,
>> Doru
>>
>> --
>> www.tudorgirba.com
>>
>> "We can create beautiful models in a vacuum.
>> But, to get them effective we have to deal with the inconvenience of reality."
>>
>>
>> <mse-vs-fuel.txt>
>>
>> _______________________________________________
>> Moose-dev mailing list
>> [hidden email]
>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev

--
www.tudorgirba.com

"Every thing has its own flow"

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Fabrizio Perin-3

Re: scaling moose

In reply to this post by Stephan Eggermont-3

Hi Stef,
Sorry but the problem is not that there are no 64-bit vms and images available. The problem is that the available 32-bit vm and image cannot be pushed further 500MB. Even worst is that as far as I understood we are not even sure why is like that.

For me a reasonable size for a Moose image containing an average size Java Enterprise application is between 500MB and 1500MB. So a 32-bit vm\image should be perfectly able to store the whole model and to have enough free space for computations.

Partial loading could be a solution in some cases but we need tool support for that. I cannot invest 2 weeks every time I need to script a 10 minutes analysis trying to figure out how to partially load the information that I "might" need. Without having a full model available the entire idea of prototyping analysis behind Moose goes down the drain and so Moose itself lose a lot of its meaning.
I think the whole point is to have all the data on the system in analysis at hand. Either having 10GB model stored in an image or loading the needed entities on demand it is not relevant as soon as it is transparent for the user and the performances are not too bad.

Cheers,
Fabrizio

2012/7/25 <[hidden email]>

Hi Doru,

When thinking about scalability of Moose, what scenarios do you have
in mind? Up to about half a terabyte, you can run out of main memory
on a single machine cost-effectively. The main limitation there is
the lack of a 64-bit vm and image. As far as I understand the access
patterns involved, a main memory based or distributed main memory
solution is far preferable for actually analyzing systems. What do you
hope to achieve by going to disk? When we did the data conversion project,
we thought about partitioning over multiple images but finally managed
with partial loading.

Stephan

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Fabrizio Perin-3

Re: scaling moose

Whoops.

Hi Stephan not Stef :)

2012/7/25 Fabrizio Perin <[hidden email]>

Hi Stef,
Sorry but the problem is not that there are no 64-bit vms and images available. The problem is that the available 32-bit vm and image cannot be pushed further 500MB. Even worst is that as far as I understood we are not even sure why is like that.

For me a reasonable size for a Moose image containing an average size Java Enterprise application is between 500MB and 1500MB. So a 32-bit vm\image should be perfectly able to store the whole model and to have enough free space for computations.

Partial loading could be a solution in some cases but we need tool support for that. I cannot invest 2 weeks every time I need to script a 10 minutes analysis trying to figure out how to partially load the information that I "might" need. Without having a full model available the entire idea of prototyping analysis behind Moose goes down the drain and so Moose itself lose a lot of its meaning.
I think the whole point is to have all the data on the system in analysis at hand. Either having 10GB model stored in an image or loading the needed entities on demand it is not relevant as soon as it is transparent for the user and the performances are not too bad.

Cheers,
Fabrizio

2012/7/25 <[hidden email]>

Hi Doru,

When thinking about scalability of Moose, what scenarios do you have
in mind? Up to about half a terabyte, you can run out of main memory
on a single machine cost-effectively. The main limitation there is
the lack of a 64-bit vm and image. As far as I understand the access
patterns involved, a main memory based or distributed main memory
solution is far preferable for actually analyzing systems. What do you
hope to achieve by going to disk? When we did the data conversion project,
we thought about partitioning over multiple images but finally managed
with partial loading.

Stephan

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Stéphane Ducasse

Re: scaling moose

In reply to this post by Fabrizio Perin-3

On Jul 25, 2012, at 4:58 PM, Fabrizio Perin wrote:

> Hi Stef,
> Sorry but the problem is not that there are no 64-bit vms and images available. The problem is that the available 32-bit vm and image cannot be pushed further 500MB. Even worst is that as far as I understood we are not even sure why is like that.
>
> For me a reasonable size for a Moose image containing an average size Java Enterprise application is between 500MB and 1500MB. So a 32-bit vm\image should be perfectly able to store the whole model and to have enough free space for computations.

I asked igor.
Esteban?

> Partial loading could be a solution in some cases but we need tool support for that. I cannot invest 2 weeks every time I need to script a 10 minutes analysis trying to figure out how to partially load the information that I "might" need. Without having a full model available the entire idea of prototyping analysis behind Moose goes down the drain and so Moose itself lose a lot of its meaning.

We are talking about clients to which you would like to sell trend analysis. So for this case this is not a question of memory optimization
because may be you want to have 50 or more version of a system and you have to think and reduce the number of element (no need to represent self, this and other….)>

> I think the whole point is to have all the data on the system in analysis at hand. Either having 10GB model stored in an image or loading the needed entities on demand it is not relevant as soon as it is transparent for the user and the performances are not too bad.

May be but so far I do not know how I can get 10 GB at hand in Pharo.

> Cheers,
> Fabrizio
>
> 2012/7/25 <[hidden email]>
> Hi Doru,
>
> When thinking about scalability of Moose, what scenarios do you have
> in mind? Up to about half a terabyte, you can run out of main memory
> on a single machine cost-effectively. The main limitation there is
> the lack of a 64-bit vm and image. As far as I understand the access
> patterns involved, a main memory based or distributed main memory
> solution is far preferable for actually analyzing systems. What do you
> hope to achieve by going to disk? When we did the data conversion project,
> we thought about partitioning over multiple images but finally managed
> with partial loading.
>
> Stephan
>
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>
> _______________________________________________
> Moose-dev mailing list
> [hidden email]
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Stephan Eggermont-3

Re: scaling moose

In reply to this post by Fabrizio Perin-3

Hi Fabrizio,

On Jul 25, 2012, at 4:58 PM, Fabrizio Perin wrote:
> Sorry but the problem is not that there are no 64-bit vms and images available. The problem is that the available 32-bit vm and image cannot be pushed further 500MB. Even worst is that as far as I understood we are not even sure why is like that.

Is that windows only? I run larger mac images.

> For me a reasonable size for a Moose image containing an average size Java Enterprise application is between 500MB and 1500MB. So a 32-bit vm\image should be perfectly able to store the whole model and to have enough free space for computations.

For me the difference between a 500 MB model and a 2GB model is not really meaningful. It still provides
a significant limit to the size of models I can handle. I try to avoid loading as much as possible. A 588 MB
image starts in 3 seconds on my smallest machine, so that is fast enough.

> Partial loading could be a solution in some cases but we need tool support for that. I cannot invest 2 weeks every time I need to script a 10 minutes analysis trying to figure out how to partially load the information that I "might" need. Without having a full model available the entire idea of prototyping analysis behind Moose goes down the drain and so Moose itself lose a lot of its meaning.

+1

> I think the whole point is to have all the data on the system in analysis at hand. Either having 10GB model stored in an image or loading the needed entities on demand it is not relevant as soon as it is transparent for the user and the performances are not too bad.

I don't understand how performance can be good using a nosql or rdbms system. Gemstone with enough ram,
or multiple pharo images with distributed processing, yes, but copying all that data around sounds to me
like a non-starter.

Stephan

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Stéphane Ducasse

Re: scaling moose

In reply to this post by Fabrizio Perin-3

>>
>>
>> Hi Stef,
>> Sorry but the problem is not that there are no 64-bit vms and images
>> available. The problem is that the available 32-bit vm and image cannot be
>> pushed further 500MB. Even worst is that as far as I understood we are not
>> even sure why is like that.
>>
>> For me a reasonable size for a Moose image containing an average size Java
>> Enterprise application is between 500MB and 1500MB. So a 32-bit vm\image
>> should be perfectly able to store the whole model and to have enough free
>> space for computations.
>>
>>
>> It is true that we cannot go larger than 500 mb?
>>
> yes, the hardcoded limit is 500mb

do you know why?

>
>>
>>
>> Partial loading could be a solution in some cases but we need tool support
>> for that. I cannot invest 2 weeks every time I need to script a 10 minutes
>> analysis trying to figure out how to partially load the information that I
>> "might" need. Without having a full model available the entire idea of
>> prototyping analysis behind Moose goes down the drain and so Moose itself
>> lose a lot of its meaning.
>> I think the whole point is to have all the data on the system in analysis at
>> hand. Either having 10GB model stored in an image or loading the needed
>> entities on demand it is not relevant as soon as it is transparent for the
>> user and the performances are not too bad.
>>
>> Cheers,
>> Fabrizio
>>
>> 2012/7/25 <[hidden email]>
>>>
>>> Hi Doru,
>>>
>>> When thinking about scalability of Moose, what scenarios do you have
>>> in mind? Up to about half a terabyte, you can run out of main memory
>>> on a single machine cost-effectively. The main limitation there is
>>> the lack of a 64-bit vm and image. As far as I understand the access
>>> patterns involved, a main memory based or distributed main memory
>>> solution is far preferable for actually analyzing systems. What do you
>>> hope to achieve by going to disk? When we did the data conversion project,
>>> we thought about partitioning over multiple images but finally managed
>>> with partial loading.
>>>
>>> Stephan
>>>
>>>
>>> _______________________________________________
>>> Moose-dev mailing list
>>> [hidden email]
>>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>
>>
>> _______________________________________________
>> Moose-dev mailing list
>> [hidden email]
>> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko.

_______________________________________________
Moose-dev mailing list
[hidden email]
https://www.iam.unibe.ch/mailman/listinfo/moose-dev