Graph library in Smalltalk - Need for advices

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Graph library in Smalltalk - Need for advices

Stéphane Ducasse
this is a nice analysis.

Now I think that if different lib are needed for people then starting to build some of them is important.
I know that Jannik also implemented and optimize tarjan and other graph algorithms for moose so
cedric it would be probably a good start to check that and probably to make the moose graph algo lib a separate entity. We (the moose people) also need a good graph lib.

Stef

> 1. Scalability
> 2. Data structures
> 3. Algorithms
> 4. Performance
> 5. Visualization
>
> I think you need to at least consider all the above when creating a library.
> In fact, you probably should break it up as many libraries do into several
> components:
>
> 1. Common, useful data structures (Vertex, Edge, Adjacency List, Adjaceny
> Matrix, etc.) that are optimized for your language - hopefully smalltalk
>
> 2. Computations / Measurements
>
> 3. Algorithms - layout, common graph problems, etc.
>
> 4. Persistence - database or otherwise, likely several supporting libs
> specific to the persistence mechanism or implementing some kind of pattern
> to be agnostic. The data structures should at least be designed in such a
> way that they are persistence friendly to put in something like Gemstone,
> Magma, Image, etc. See for example InfiniteGraph which has a Java interface
> built on Objectivity.
>
> 5. Visualization - desktop, web, and static output.
>
> The problem here is that creating a graph library is a huge undertaking. It
> might not sound like it, but they can grow to epic proportions. From my
> comments, you can see that it is really not the fault of any particular lib,
> but rather the subject matter. You could spend a lifetime packing in
> features. The real key is just to create a series of libs that work well
> together and not constantly reinvent the wheel with node classes in each
> library.
>
> A secondary problem is doing graph analysis and even drawing graphs can take
> a lot of horse power. Parallelism is a tough issue with graph libraries and
> there's a multitude of approaches from pure threads to map/reduce to random
> walking and stream processing.  This is further compounded by persistence
> where you need to start doing things like graph healing, partitioning, and
> sharding to scale to massive levels and maintain performance.
>
> I just want to mention to others that graphs are hugely useful in general to
> solve a variety of problems from recommendations to meta programming. It's
> not just pretty pictures and experiments with molecules. There's a lot of
> potential in Smalltalk to do something since generally I feel Smalltalkers
> aren't bound by or afraid of new approaches to old problems. Graph databases
> in many cases could replace relational DBs for example and let your app do
> all kinds of stuff you might never have thought of otherwise.
>
> I could go on and on, but I'll stop myself here. Comments or thoughts?
>
>
>
>
>
>
> Stéphane Ducasse wrote:
>>
>>>
>>> I've started looking at the exemples YossiDM gave to me and in particular
>>> Lemon which was according to him his best experience. I found the model
>>> quite clear and covering all what I expect for a generic graph lib
>>> (directed, undirected, mapping concept, iterators, and algorithms of
>>> course). Moreover and contrary to Boost, it's still developed in 2010.
>>>
>>> To be more precise, here's what I expect for a generic graph lib in
>>> smalltalk (note all in Lemon except visualization):
>>>
>>> - data structure: directed graphs, undirected graphs, possible loop and
>>> parallel edged, ..., trees (?)
>>> - mapping: easily map objects, informations on nodes and/or edges (here,
>>> don't know if I'd like subclassing nodes/eges instead...)
>>> - iterators: efficient way to iterate over nodes and edges
>>> - algorithms: basic algorithms implementation (bfs, dfs, ..., shortest
>>> paths, ...), and plug-ability for specific ones...
>>> - visualization: having an interactive graph visualization web/SVG and
>>> eventually morphic (... graphviz, mondrian, .......)
>>>
>>> then, I could use this for my research work...
>>> - I need "belief" nodes with associated conditional beliefs tables
>>> - I need to implement algorithms to propagate an information change
>>> (change of a node state) in any nodes...
>>> ****mainly, I'd like to get junction trees from a graph [1] which are
>>> rely useful for several domains in machine learning field ***
>>>
>>> Actually, I don't know if I really need a graph lib as a simple
>>> implementation directed to bayesian should be enough but it's the second
>>> time I need graphs (last time was for planification) and I think that
>>> would be great to have a nice and clean basic implementation.
>>>
>>> Couldn't we start developing something similar to Lemon (regarding "API",
>>> enitites, etc...) that would work for small scale project project in
>>> smalltalk ?
>>
>> It would be excellent.
>> Because now that you have a full time permanent position you can invest a
>> bit and in 2 years you can get something really sexy....
>> This is what we are doing all the time around pharo.
>>
>>> Yossi, what were the limitations you found with Lemon ?
>>>
>>> Cheers,
>>>
>>> Cédrick
>>>
>>
>>
>>
>>
>
> --
> View this message in context: http://forum.world.st/Graph-library-in-Smalltalk-Need-for-advices-tp3092747p3159886.html
> Sent from the Pharo Smalltalk mailing list archive at Nabble.com.
>


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Graph library in Smalltalk - Need for advices

Tudor Girba
Like I wrote before, MooseAlgos is a separate and independent project.

But, yes, it would be really great to have a stronger effort around these topics, and the Moose community could provide the right environment for this.

Cheers,
Doru


On 22 Dec 2010, at 08:50, Stéphane Ducasse wrote:

> this is a nice analysis.
>
> Now I think that if different lib are needed for people then starting to build some of them is important.
> I know that Jannik also implemented and optimize tarjan and other graph algorithms for moose so
> cedric it would be probably a good start to check that and probably to make the moose graph algo lib a separate entity. We (the moose people) also need a good graph lib.
>
> Stef
>
>> 1. Scalability
>> 2. Data structures
>> 3. Algorithms
>> 4. Performance
>> 5. Visualization
>>
>> I think you need to at least consider all the above when creating a library.
>> In fact, you probably should break it up as many libraries do into several
>> components:
>>
>> 1. Common, useful data structures (Vertex, Edge, Adjacency List, Adjaceny
>> Matrix, etc.) that are optimized for your language - hopefully smalltalk
>>
>> 2. Computations / Measurements
>>
>> 3. Algorithms - layout, common graph problems, etc.
>>
>> 4. Persistence - database or otherwise, likely several supporting libs
>> specific to the persistence mechanism or implementing some kind of pattern
>> to be agnostic. The data structures should at least be designed in such a
>> way that they are persistence friendly to put in something like Gemstone,
>> Magma, Image, etc. See for example InfiniteGraph which has a Java interface
>> built on Objectivity.
>>
>> 5. Visualization - desktop, web, and static output.
>>
>> The problem here is that creating a graph library is a huge undertaking. It
>> might not sound like it, but they can grow to epic proportions. From my
>> comments, you can see that it is really not the fault of any particular lib,
>> but rather the subject matter. You could spend a lifetime packing in
>> features. The real key is just to create a series of libs that work well
>> together and not constantly reinvent the wheel with node classes in each
>> library.
>>
>> A secondary problem is doing graph analysis and even drawing graphs can take
>> a lot of horse power. Parallelism is a tough issue with graph libraries and
>> there's a multitude of approaches from pure threads to map/reduce to random
>> walking and stream processing.  This is further compounded by persistence
>> where you need to start doing things like graph healing, partitioning, and
>> sharding to scale to massive levels and maintain performance.
>>
>> I just want to mention to others that graphs are hugely useful in general to
>> solve a variety of problems from recommendations to meta programming. It's
>> not just pretty pictures and experiments with molecules. There's a lot of
>> potential in Smalltalk to do something since generally I feel Smalltalkers
>> aren't bound by or afraid of new approaches to old problems. Graph databases
>> in many cases could replace relational DBs for example and let your app do
>> all kinds of stuff you might never have thought of otherwise.
>>
>> I could go on and on, but I'll stop myself here. Comments or thoughts?
>>
>>
>>
>>
>>
>>
>> Stéphane Ducasse wrote:
>>>
>>>>
>>>> I've started looking at the exemples YossiDM gave to me and in particular
>>>> Lemon which was according to him his best experience. I found the model
>>>> quite clear and covering all what I expect for a generic graph lib
>>>> (directed, undirected, mapping concept, iterators, and algorithms of
>>>> course). Moreover and contrary to Boost, it's still developed in 2010.
>>>>
>>>> To be more precise, here's what I expect for a generic graph lib in
>>>> smalltalk (note all in Lemon except visualization):
>>>>
>>>> - data structure: directed graphs, undirected graphs, possible loop and
>>>> parallel edged, ..., trees (?)
>>>> - mapping: easily map objects, informations on nodes and/or edges (here,
>>>> don't know if I'd like subclassing nodes/eges instead...)
>>>> - iterators: efficient way to iterate over nodes and edges
>>>> - algorithms: basic algorithms implementation (bfs, dfs, ..., shortest
>>>> paths, ...), and plug-ability for specific ones...
>>>> - visualization: having an interactive graph visualization web/SVG and
>>>> eventually morphic (... graphviz, mondrian, .......)
>>>>
>>>> then, I could use this for my research work...
>>>> - I need "belief" nodes with associated conditional beliefs tables
>>>> - I need to implement algorithms to propagate an information change
>>>> (change of a node state) in any nodes...
>>>> ****mainly, I'd like to get junction trees from a graph [1] which are
>>>> rely useful for several domains in machine learning field ***
>>>>
>>>> Actually, I don't know if I really need a graph lib as a simple
>>>> implementation directed to bayesian should be enough but it's the second
>>>> time I need graphs (last time was for planification) and I think that
>>>> would be great to have a nice and clean basic implementation.
>>>>
>>>> Couldn't we start developing something similar to Lemon (regarding "API",
>>>> enitites, etc...) that would work for small scale project project in
>>>> smalltalk ?
>>>
>>> It would be excellent.
>>> Because now that you have a full time permanent position you can invest a
>>> bit and in 2 years you can get something really sexy....
>>> This is what we are doing all the time around pharo.
>>>
>>>> Yossi, what were the limitations you found with Lemon ?
>>>>
>>>> Cheers,
>>>>
>>>> Cédrick
>>>>
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://forum.world.st/Graph-library-in-Smalltalk-Need-for-advices-tp3092747p3159886.html
>> Sent from the Pharo Smalltalk mailing list archive at Nabble.com.
>>
>
>

--
www.tudorgirba.com

"Speaking louder won't make the point worthier."


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Graph library in Smalltalk - Need for advices

cedreek
In reply to this post by YossiDM



Sorry for the late reply, I've been away on business. I'm going to have to
make one of those unfortunate long posts, so hopefully I can hold the
attention span of those already interested anyway.

Not so long and very interesting so please continue... :)
some comments below (remember I have not a lot of knowledge and experience on graphs)...


Cédrick - I found the main limitation with Lemon to be that it was a pain to
extend as well as to integrate a nice persistence layer.

ok. Pain to extend sounds bad... but as it seems to scale well, it could be a candidate for interoperability with a smalltalk lib...

[...]

It's been my experience after using probably 20 different graph libraries in
real projects over the years that the following themes reoccur:

that's really why your opinion is really important !


1. Library is good for visualization, but not for calculations - shortest
path, centrality, MST, etc.

2. Library is good for algorithms but has no visualization support

ok.
To me 1 and 2 should be dissociated anyway.


3. Graph library relies too much on other libraries that are not that
impressive either. Graphviz comes to mine. Great tool, but frankly the
visualizations looks awful by today's standards and it's harder to do
something fully dynamic. Piping a dot file to a unix process to produce a
static image is not terribly useful in things like web applications.

I fully agree.


4. Library does not scale predictably or to a certain point.

This is an aspect I care not much as I said I only work on small scale project.
But this is definitely an important point. At least to know the limitations.


5. Library does not support persistence properly.

A point I don't know.


6. Library forces an API that takes over your data too much or doesn't play
well with existing data. Any model that forces inheritance for Vertex and
Edge classes comes to mind. See neo4j for some examples of a poor API on
this point.

I think this is a really important point.
Is the mapping concept of lemon useful here  ?


7. Library is good all around but lacks depth (missing algorithms, measures,
etc) and does not play well with others.

entry point for extensions ? patterns ?


8. Every library has its own idea of what a vertex and edge should be and
there's no good way of passing data between them in a nice manner.

pass data between libs ?
Maybe once we get a lib in smalltalk, then other lib can be selected


9. Visualization libraries have a multitude of issues - specific to desktop
or web only, static output, no capability to annotate vertex or edges, no
pruning... I have yet to find one that is both pretty and useful to this day
without nearly recreating a new library each time.

ok.


The reality is every library will have to make compromises at a minimum in
the following areas:

1. Scalability
2. Data structures
3. Algorithms
4. Performance
5. Visualization

To me, (might be naive) we should have a nice and clean model for 2, 3 and 4. 
Performance is to me more about the complexity of an operation (ie. algorithm variations that rely on the chosen graph (internal) data structure).

Scalability (1) is really important but is another point, especially in Smalltalk. Would you use primitives, parallelism, multi-image, bridges to existing lib... , quantic vm (kidding :o) )?
What's more important is probably to have a good documentation stating what the limitations are.

Moreover, it seems 90% of interested people (here) in a graph lib are a priori interested in small scale apps.

Then another lib for 5 (might be mondrian based)...



I think you need to at least consider all the above when creating a library.

fully agree.
I guess we cannot claim to have the ultimate lib. But I would be very happy to have something smart "a la smalltalk" :)

In fact, you probably should break it up as many libraries do into several
components:

1. Common, useful data structures (Vertex, Edge, Adjacency List, Adjaceny
Matrix, etc.) that are optimized for your language - hopefully smalltalk

YES
but then what strategy to map objects ?


2. Computations / Measurements

for instances ?

Where do you classify operations like moralization, variable elimination, junction tree transformation ? 
(yes it's directed to my need ;-) )


3. Algorithms - layout, common graph problems, etc.

4. Persistence - database or otherwise, likely several supporting libs
specific to the persistence mechanism or implementing some kind of pattern
to be agnostic.

would be great !

The data structures should at least be designed in such a
way that they are persistence friendly to put in something like Gemstone,
Magma, Image, etc. See for example InfiniteGraph which has a Java interface
built on Objectivity.

I guess gemstone (and other persistency players) could play a nice role here... :)


5. Visualization - desktop, web, and static output.

The problem here is that creating a graph library is a huge undertaking. It
might not sound like it, but they can grow to epic proportions. From my
comments, you can see that it is really not the fault of any particular lib,
but rather the subject matter. You could spend a lifetime packing in
features. The real key is just to create a series of libs that work well
together and not constantly reinvent the wheel with node classes in each
library.

yes I understand that :(  
To me, the problem is that graph is a too much generic term and involve too many possible processing that are not easily generalizable (cyclic/acyclic, directed/undirected, parallel/looping edges, a tree is a graph too...).
For instance, it's easier to me to implement a tree algorithm on a straight tree structure than on a particular graph being a tree.



A secondary problem is doing graph analysis and even drawing graphs can take
a lot of horse power. Parallelism is a tough issue with graph libraries and
there's a multitude of approaches from pure threads to map/reduce to random
walking and stream processing.  This is further compounded by persistence
where you need to start doing things like graph healing, partitioning, and
sharding to scale to massive levels and maintain performance.

looks hard indeed.


I just want to mention to others that graphs are hugely useful in general to
solve a variety of problems from recommendations to meta programming. It's
not just pretty pictures and experiments with molecules. There's a lot of
potential in Smalltalk to do something since generally I feel Smalltalkers
aren't bound by or afraid of new approaches to old problems. Graph databases
in many cases could replace relational DBs for example and let your app do
all kinds of stuff you might never have thought of otherwise.

I could go on and on, but I'll stop myself here. Comments or thoughts?

My main thought is that I think you're clearly the most experienced guy here. That's why I'd like you to be the "chief architect" if we start a project around graphs.

Do you have some design ideas for:
Data structures (2)  with in mind Algorithms (3), Performance (4) and Scalability (1)
Again I think visualization (5) is another topic. It could come later...

Or in other word, could we start designing graph data structures as a project ? 
I'm pretty sure you have some strong opinions on their design so as not to sacrifice (too much) 3, 4 and 1 ;-). Or at least, you will know the limitations and consequences of design choices...

What do you think ? If you don't have enough time, you could just give directions and eventually review the code...

As Stephane said, this is not an urgent project and we don't have to build the ultimate lib...


Cheers,

Cédrick





- data structure: directed graphs, undirected graphs, possible loop and
parallel edged, ..., trees (?)
- mapping: easily map objects, informations on nodes and/or edges (here,
don't know if I'd like subclassing nodes/eges instead...)
- iterators: efficient way to iterate over nodes and edges
- algorithms: basic algorithms implementation (bfs, dfs, ..., shortest
paths, ...), and plug-ability for specific ones...
- visualization: having an interactive graph visualization web/SVG and
eventually morphic (... graphviz, mondrian, .......)

then, I could use this for my research work...
- I need "belief" nodes with associated conditional beliefs tables
- I need to implement algorithms to propagate an information change
(change of a node state) in any nodes...
****mainly, I'd like to get junction trees from a graph [1] which are
rely useful for several domains in machine learning field ***

Actually, I don't know if I really need a graph lib as a simple
implementation directed to bayesian should be enough but it's the second
time I need graphs (last time was for planification) and I think that
would be great to have a nice and clean basic implementation.

Couldn't we start developing something similar to Lemon (regarding "API",
enitites, etc...) that would work for small scale project project in
smalltalk ?


Reply | Threaded
Open this post in threaded view
|

Re: Graph library in Smalltalk - Need for advices

Chris Muller-3
In reply to this post by Alexandre Bergel
>> Hi Alexandre, I only have very limited experience with Mondrian, so
>> please excuse me if this is a stupid question.  Is it possible to
>> employ an Iterator to simply draw on a raster image rather than
>> creating a model in RAM?  Perhaps it could be an option?  Would it be
>> hard to make Mondrian work this way?
>
> I am not sure what you mean by raster image.

Raster image = bitmap.  As you know, bitmaps have the ability, no
matter whether you render 1 node or 1-billion nodes and edges on it,
it does not take up much more space.

But because Mondrian builds an object-model in RAM (from which it
renders), there is a definite  scalability limit in terms of the
number of nodes.  Rendering also takes more time due to
object-allocation, etc.

The only reasons I can think to need to create a full domain model of
the graph is:

  1) to calculate positioning
  2) for user-input; being able to drag the nodes around (thus
updating the Mondrian domain model, for subsequent display updates).

(1) _might_ be able to be accomodated by, when generating the picture,
only keeping track of various node counts at each level and/or overall
bounds of the picture.

(2) is what I'm arguing might not be a good trade-off in all cases.
Sure, the output is now a static picture instead of something with
draggable boxes, but at the cost of speed and scale.

Does that make sense?  I find Mondrian interesting, thanks for your feedback.

 - Chris

Reply | Threaded
Open this post in threaded view
|

Re: Graph library in Smalltalk - Need for advices

Alexandre Bergel
> Raster image = bitmap.  As you know, bitmaps have the ability, no
> matter whether you render 1 node or 1-billion nodes and edges on it,
> it does not take up much more space.

It depends on the size of the bitmap :-)

> But because Mondrian builds an object-model in RAM (from which it
> renders), there is a definite  scalability limit in terms of the
> number of nodes.  Rendering also takes more time due to
> object-allocation, etc.

In Mondrian, each object is rendered via a shape. There is a number of shapes, one of them is MOImageShape, which displays a bitmap.

> The only reasons I can think to need to create a full domain model of
> the graph is:
>
>  1) to calculate positioning
>  2) for user-input; being able to drag the nodes around (thus
> updating the Mondrian domain model, for subsequent display updates).

I am not sure to understand. Suppose you have a domain (e.g., cars, persons and ownership). You can render this domain the way you want. Dragging nodes does not changes your models. It changes the graph built by Mondrian (i.e., instances of MONode and MOEdes).

> (1) _might_ be able to be accomodated by, when generating the picture,
> only keeping track of various node counts at each level and/or overall
> bounds of the picture.

Mondrian has a cache mechanism that does exactly that. A node may contains elements. A bitmap is created for you to speed the things up.

> (2) is what I'm arguing might not be a good trade-off in all cases.
> Sure, the output is now a static picture instead of something with
> draggable boxes, but at the cost of speed and scale.
>
> Does that make sense?  I find Mondrian interesting, thanks for your feedback.

You can drag and drop inner nodes, the caches is reset without any particular action.

What is exactly the problem you're trying to solve?

Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






Reply | Threaded
Open this post in threaded view
|

Re: Graph library in Smalltalk - Need for advices

Chris Muller-4
On Wed, Dec 22, 2010 at 11:09 AM, Alexandre Bergel <[hidden email]> wrote:
>> Raster image = bitmap.  As you know, bitmaps have the ability, no
>> matter whether you render 1 node or 1-billion nodes and edges on it,
>> it does not take up much more space.
>
> It depends on the size of the bitmap :-)

Not really; if you start with an empty (say, all white), uncompressed
bitmap, painting various graphics on it does not change its
uncompressed size.

> I am not sure to understand. Suppose you have a domain (e.g., cars, persons and ownership). You can render this domain the way you want. Dragging nodes does not changes your models. It changes the graph built by Mondrian (i.e., instances of MONode and MOEdes).

Right.  I am not talking about the cars and persons, I am talking
about "Mondrians domain model", "(i.e., instances of MONode and
MOEdes)".  If I have 1 million cars and persons and I want Mondrian to
render a visualization involving all of them, then it has to create
and *keep* 1 million MONodes.  I want to do it without keeping them
since that cannot scale.

Assume, perhaps, that I have my 1 million Cars in a Magma database,
and I don't want to persist 1M MONodes just to render a quick picture.

> What is exactly the problem you're trying to solve?

The (imaginary) problem I'm interested to solve is I have a huge model
persisted in Magma, I want to make a visualization based on one
million+ objects without creating and keeping 1M MONodes in memory....
 Instead, I would like to create each MONode, render it on the raster
image, but don't keep it so it can be GC'd.  But its representation is
already preserved on the bitmap.

Reply | Threaded
Open this post in threaded view
|

Re: Graph library in Smalltalk - Need for advices

askoh
Administrator
In reply to this post by cedreek
Check out JUN for Smalltalk.
http://aokilab.kyoto-su.ac.jp/jun/index.html
12