this is a nice analysis.
Now I think that if different lib are needed for people then starting to build some of them is important. I know that Jannik also implemented and optimize tarjan and other graph algorithms for moose so cedric it would be probably a good start to check that and probably to make the moose graph algo lib a separate entity. We (the moose people) also need a good graph lib. Stef > 1. Scalability > 2. Data structures > 3. Algorithms > 4. Performance > 5. Visualization > > I think you need to at least consider all the above when creating a library. > In fact, you probably should break it up as many libraries do into several > components: > > 1. Common, useful data structures (Vertex, Edge, Adjacency List, Adjaceny > Matrix, etc.) that are optimized for your language - hopefully smalltalk > > 2. Computations / Measurements > > 3. Algorithms - layout, common graph problems, etc. > > 4. Persistence - database or otherwise, likely several supporting libs > specific to the persistence mechanism or implementing some kind of pattern > to be agnostic. The data structures should at least be designed in such a > way that they are persistence friendly to put in something like Gemstone, > Magma, Image, etc. See for example InfiniteGraph which has a Java interface > built on Objectivity. > > 5. Visualization - desktop, web, and static output. > > The problem here is that creating a graph library is a huge undertaking. It > might not sound like it, but they can grow to epic proportions. From my > comments, you can see that it is really not the fault of any particular lib, > but rather the subject matter. You could spend a lifetime packing in > features. The real key is just to create a series of libs that work well > together and not constantly reinvent the wheel with node classes in each > library. > > A secondary problem is doing graph analysis and even drawing graphs can take > a lot of horse power. Parallelism is a tough issue with graph libraries and > there's a multitude of approaches from pure threads to map/reduce to random > walking and stream processing. This is further compounded by persistence > where you need to start doing things like graph healing, partitioning, and > sharding to scale to massive levels and maintain performance. > > I just want to mention to others that graphs are hugely useful in general to > solve a variety of problems from recommendations to meta programming. It's > not just pretty pictures and experiments with molecules. There's a lot of > potential in Smalltalk to do something since generally I feel Smalltalkers > aren't bound by or afraid of new approaches to old problems. Graph databases > in many cases could replace relational DBs for example and let your app do > all kinds of stuff you might never have thought of otherwise. > > I could go on and on, but I'll stop myself here. Comments or thoughts? > > > > > > > Stéphane Ducasse wrote: >> >>> >>> I've started looking at the exemples YossiDM gave to me and in particular >>> Lemon which was according to him his best experience. I found the model >>> quite clear and covering all what I expect for a generic graph lib >>> (directed, undirected, mapping concept, iterators, and algorithms of >>> course). Moreover and contrary to Boost, it's still developed in 2010. >>> >>> To be more precise, here's what I expect for a generic graph lib in >>> smalltalk (note all in Lemon except visualization): >>> >>> - data structure: directed graphs, undirected graphs, possible loop and >>> parallel edged, ..., trees (?) >>> - mapping: easily map objects, informations on nodes and/or edges (here, >>> don't know if I'd like subclassing nodes/eges instead...) >>> - iterators: efficient way to iterate over nodes and edges >>> - algorithms: basic algorithms implementation (bfs, dfs, ..., shortest >>> paths, ...), and plug-ability for specific ones... >>> - visualization: having an interactive graph visualization web/SVG and >>> eventually morphic (... graphviz, mondrian, .......) >>> >>> then, I could use this for my research work... >>> - I need "belief" nodes with associated conditional beliefs tables >>> - I need to implement algorithms to propagate an information change >>> (change of a node state) in any nodes... >>> ****mainly, I'd like to get junction trees from a graph [1] which are >>> rely useful for several domains in machine learning field *** >>> >>> Actually, I don't know if I really need a graph lib as a simple >>> implementation directed to bayesian should be enough but it's the second >>> time I need graphs (last time was for planification) and I think that >>> would be great to have a nice and clean basic implementation. >>> >>> Couldn't we start developing something similar to Lemon (regarding "API", >>> enitites, etc...) that would work for small scale project project in >>> smalltalk ? >> >> It would be excellent. >> Because now that you have a full time permanent position you can invest a >> bit and in 2 years you can get something really sexy.... >> This is what we are doing all the time around pharo. >> >>> Yossi, what were the limitations you found with Lemon ? >>> >>> Cheers, >>> >>> Cédrick >>> >> >> >> >> > > -- > View this message in context: http://forum.world.st/Graph-library-in-Smalltalk-Need-for-advices-tp3092747p3159886.html > Sent from the Pharo Smalltalk mailing list archive at Nabble.com. > |
Like I wrote before, MooseAlgos is a separate and independent project.
But, yes, it would be really great to have a stronger effort around these topics, and the Moose community could provide the right environment for this. Cheers, Doru On 22 Dec 2010, at 08:50, Stéphane Ducasse wrote: > this is a nice analysis. > > Now I think that if different lib are needed for people then starting to build some of them is important. > I know that Jannik also implemented and optimize tarjan and other graph algorithms for moose so > cedric it would be probably a good start to check that and probably to make the moose graph algo lib a separate entity. We (the moose people) also need a good graph lib. > > Stef > >> 1. Scalability >> 2. Data structures >> 3. Algorithms >> 4. Performance >> 5. Visualization >> >> I think you need to at least consider all the above when creating a library. >> In fact, you probably should break it up as many libraries do into several >> components: >> >> 1. Common, useful data structures (Vertex, Edge, Adjacency List, Adjaceny >> Matrix, etc.) that are optimized for your language - hopefully smalltalk >> >> 2. Computations / Measurements >> >> 3. Algorithms - layout, common graph problems, etc. >> >> 4. Persistence - database or otherwise, likely several supporting libs >> specific to the persistence mechanism or implementing some kind of pattern >> to be agnostic. The data structures should at least be designed in such a >> way that they are persistence friendly to put in something like Gemstone, >> Magma, Image, etc. See for example InfiniteGraph which has a Java interface >> built on Objectivity. >> >> 5. Visualization - desktop, web, and static output. >> >> The problem here is that creating a graph library is a huge undertaking. It >> might not sound like it, but they can grow to epic proportions. From my >> comments, you can see that it is really not the fault of any particular lib, >> but rather the subject matter. You could spend a lifetime packing in >> features. The real key is just to create a series of libs that work well >> together and not constantly reinvent the wheel with node classes in each >> library. >> >> A secondary problem is doing graph analysis and even drawing graphs can take >> a lot of horse power. Parallelism is a tough issue with graph libraries and >> there's a multitude of approaches from pure threads to map/reduce to random >> walking and stream processing. This is further compounded by persistence >> where you need to start doing things like graph healing, partitioning, and >> sharding to scale to massive levels and maintain performance. >> >> I just want to mention to others that graphs are hugely useful in general to >> solve a variety of problems from recommendations to meta programming. It's >> not just pretty pictures and experiments with molecules. There's a lot of >> potential in Smalltalk to do something since generally I feel Smalltalkers >> aren't bound by or afraid of new approaches to old problems. Graph databases >> in many cases could replace relational DBs for example and let your app do >> all kinds of stuff you might never have thought of otherwise. >> >> I could go on and on, but I'll stop myself here. Comments or thoughts? >> >> >> >> >> >> >> Stéphane Ducasse wrote: >>> >>>> >>>> I've started looking at the exemples YossiDM gave to me and in particular >>>> Lemon which was according to him his best experience. I found the model >>>> quite clear and covering all what I expect for a generic graph lib >>>> (directed, undirected, mapping concept, iterators, and algorithms of >>>> course). Moreover and contrary to Boost, it's still developed in 2010. >>>> >>>> To be more precise, here's what I expect for a generic graph lib in >>>> smalltalk (note all in Lemon except visualization): >>>> >>>> - data structure: directed graphs, undirected graphs, possible loop and >>>> parallel edged, ..., trees (?) >>>> - mapping: easily map objects, informations on nodes and/or edges (here, >>>> don't know if I'd like subclassing nodes/eges instead...) >>>> - iterators: efficient way to iterate over nodes and edges >>>> - algorithms: basic algorithms implementation (bfs, dfs, ..., shortest >>>> paths, ...), and plug-ability for specific ones... >>>> - visualization: having an interactive graph visualization web/SVG and >>>> eventually morphic (... graphviz, mondrian, .......) >>>> >>>> then, I could use this for my research work... >>>> - I need "belief" nodes with associated conditional beliefs tables >>>> - I need to implement algorithms to propagate an information change >>>> (change of a node state) in any nodes... >>>> ****mainly, I'd like to get junction trees from a graph [1] which are >>>> rely useful for several domains in machine learning field *** >>>> >>>> Actually, I don't know if I really need a graph lib as a simple >>>> implementation directed to bayesian should be enough but it's the second >>>> time I need graphs (last time was for planification) and I think that >>>> would be great to have a nice and clean basic implementation. >>>> >>>> Couldn't we start developing something similar to Lemon (regarding "API", >>>> enitites, etc...) that would work for small scale project project in >>>> smalltalk ? >>> >>> It would be excellent. >>> Because now that you have a full time permanent position you can invest a >>> bit and in 2 years you can get something really sexy.... >>> This is what we are doing all the time around pharo. >>> >>>> Yossi, what were the limitations you found with Lemon ? >>>> >>>> Cheers, >>>> >>>> Cédrick >>>> >>> >>> >>> >>> >> >> -- >> View this message in context: http://forum.world.st/Graph-library-in-Smalltalk-Need-for-advices-tp3092747p3159886.html >> Sent from the Pharo Smalltalk mailing list archive at Nabble.com. >> > > -- www.tudorgirba.com "Speaking louder won't make the point worthier." |
In reply to this post by YossiDM
Not so long and very interesting so please continue... :) some comments below (remember I have not a lot of knowledge and experience on graphs)...
ok. Pain to extend sounds bad... but as it seems to scale well, it could be a candidate for interoperability with a smalltalk lib...
that's really why your opinion is really important !
ok. To me 1 and 2 should be dissociated anyway.
I fully agree.
This is an aspect I care not much as I said I only work on small scale project. But this is definitely an important point. At least to know the limitations.
A point I don't know.
I think this is a really important point. Is the mapping concept of lemon useful here ?
entry point for extensions ? patterns ?
pass data between libs ? Maybe once we get a lib in smalltalk, then other lib can be selected
ok.
To me, (might be naive) we should have a nice and clean model for 2, 3 and 4. Performance is to me more about the complexity of an operation (ie. algorithm variations that rely on the chosen graph (internal) data structure). Scalability (1) is really important but is another point, especially in Smalltalk. Would you use primitives, parallelism, multi-image, bridges to existing lib... , quantic vm (kidding :o) )? What's more important is probably to have a good documentation stating what the limitations are. Moreover, it seems 90% of interested people (here) in a graph lib are a priori interested in small scale apps. Then another lib for 5 (might be mondrian based)...
fully agree. I guess we cannot claim to have the ultimate lib. But I would be very happy to have something smart "a la smalltalk" :)
YES but then what strategy to map objects ?
for instances ? Where do you classify operations like moralization, variable elimination, junction tree transformation ? (yes it's directed to my need ;-) )
would be great !
I guess gemstone (and other persistency players) could play a nice role here... :)
yes I understand that :( To me, the problem is that graph is a too much generic term and involve too many possible processing that are not easily generalizable (cyclic/acyclic, directed/undirected, parallel/looping edges, a tree is a graph too...). For instance, it's easier to me to implement a tree algorithm on a straight tree structure than on a particular graph being a tree.
looks hard indeed.
My main thought is that I think you're clearly the most experienced guy here. That's why I'd like you to be the "chief architect" if we start a project around graphs. Do you have some design ideas for: Data structures (2) with in mind Algorithms (3), Performance (4) and Scalability (1) Again I think visualization (5) is another topic. It could come later... Or in other word, could we start designing graph data structures as a project ? I'm pretty sure you have some strong opinions on their design so as not to sacrifice (too much) 3, 4 and 1 ;-). Or at least, you will know the limitations and consequences of design choices... What do you think ? If you don't have enough time, you could just give directions and eventually review the code... As Stephane said, this is not an urgent project and we don't have to build the ultimate lib... Cheers, Cédrick
|
In reply to this post by Alexandre Bergel
>> Hi Alexandre, I only have very limited experience with Mondrian, so
>> please excuse me if this is a stupid question. Is it possible to >> employ an Iterator to simply draw on a raster image rather than >> creating a model in RAM? Perhaps it could be an option? Would it be >> hard to make Mondrian work this way? > > I am not sure what you mean by raster image. Raster image = bitmap. As you know, bitmaps have the ability, no matter whether you render 1 node or 1-billion nodes and edges on it, it does not take up much more space. But because Mondrian builds an object-model in RAM (from which it renders), there is a definite scalability limit in terms of the number of nodes. Rendering also takes more time due to object-allocation, etc. The only reasons I can think to need to create a full domain model of the graph is: 1) to calculate positioning 2) for user-input; being able to drag the nodes around (thus updating the Mondrian domain model, for subsequent display updates). (1) _might_ be able to be accomodated by, when generating the picture, only keeping track of various node counts at each level and/or overall bounds of the picture. (2) is what I'm arguing might not be a good trade-off in all cases. Sure, the output is now a static picture instead of something with draggable boxes, but at the cost of speed and scale. Does that make sense? I find Mondrian interesting, thanks for your feedback. - Chris |
> Raster image = bitmap. As you know, bitmaps have the ability, no
> matter whether you render 1 node or 1-billion nodes and edges on it, > it does not take up much more space. It depends on the size of the bitmap :-) > But because Mondrian builds an object-model in RAM (from which it > renders), there is a definite scalability limit in terms of the > number of nodes. Rendering also takes more time due to > object-allocation, etc. In Mondrian, each object is rendered via a shape. There is a number of shapes, one of them is MOImageShape, which displays a bitmap. > The only reasons I can think to need to create a full domain model of > the graph is: > > 1) to calculate positioning > 2) for user-input; being able to drag the nodes around (thus > updating the Mondrian domain model, for subsequent display updates). I am not sure to understand. Suppose you have a domain (e.g., cars, persons and ownership). You can render this domain the way you want. Dragging nodes does not changes your models. It changes the graph built by Mondrian (i.e., instances of MONode and MOEdes). > (1) _might_ be able to be accomodated by, when generating the picture, > only keeping track of various node counts at each level and/or overall > bounds of the picture. Mondrian has a cache mechanism that does exactly that. A node may contains elements. A bitmap is created for you to speed the things up. > (2) is what I'm arguing might not be a good trade-off in all cases. > Sure, the output is now a static picture instead of something with > draggable boxes, but at the cost of speed and scale. > > Does that make sense? I find Mondrian interesting, thanks for your feedback. You can drag and drop inner nodes, the caches is reset without any particular action. What is exactly the problem you're trying to solve? Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. |
On Wed, Dec 22, 2010 at 11:09 AM, Alexandre Bergel <[hidden email]> wrote:
>> Raster image = bitmap. As you know, bitmaps have the ability, no >> matter whether you render 1 node or 1-billion nodes and edges on it, >> it does not take up much more space. > > It depends on the size of the bitmap :-) Not really; if you start with an empty (say, all white), uncompressed bitmap, painting various graphics on it does not change its uncompressed size. > I am not sure to understand. Suppose you have a domain (e.g., cars, persons and ownership). You can render this domain the way you want. Dragging nodes does not changes your models. It changes the graph built by Mondrian (i.e., instances of MONode and MOEdes). Right. I am not talking about the cars and persons, I am talking about "Mondrians domain model", "(i.e., instances of MONode and MOEdes)". If I have 1 million cars and persons and I want Mondrian to render a visualization involving all of them, then it has to create and *keep* 1 million MONodes. I want to do it without keeping them since that cannot scale. Assume, perhaps, that I have my 1 million Cars in a Magma database, and I don't want to persist 1M MONodes just to render a quick picture. > What is exactly the problem you're trying to solve? The (imaginary) problem I'm interested to solve is I have a huge model persisted in Magma, I want to make a visualization based on one million+ objects without creating and keeping 1M MONodes in memory.... Instead, I would like to create each MONode, render it on the raster image, but don't keep it so it can be GC'd. But its representation is already preserved on the bitmap. |
Administrator
|
In reply to this post by cedreek
Check out JUN for Smalltalk.
http://aokilab.kyoto-su.ac.jp/jun/index.html |
Free forum by Nabble | Edit this page |