I am involved in some Hadoop deployments and there is a very interesting possiblity for Pharo in that ecosystem.
Namely, there is a YARN thing in there which is a scheduler for distributing computing on a cluster of nodes. It is possible to deploy all kinds of technologies on the nodes (e.g. Python, R, Java) and Pharo images and VM (in headless mode) could be deployed as well. The deployed node can communicate back to what is called an AppllicationManager via REST callbacks (easy game in Pharo). There is also a C API (now, this is FFI or a plugin - http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html) There is also an Hadoop component named ZooKeeper that focuses on acting as a distributed configuration repository. One can talk to it with REST too (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest) Given the fact that we also can use some Java calls (using the JNI module with 32-bits Java), we can integrate well enough on YARN I'd say. There is also another project which is very nice and this is SLIDER (on YARN). This is about deploying stuff in an elastic way, (see http://slider.incubator.apache.org/) The next logical thing is to have docker containers (containing a pharo stack) deployed dynamically on the cluster using Slider (like this: http://www.slideshare.net/hortonworks/docker-on-slider-45493303) First step here would be to have a basic YARN-Pharo application and a PoC for talking to ZooKeeper. This would open interesting gates for Pharo given its strengths. Even more when we'll get a 64-bit VM. What is cool with Pharo is that an image can be very small and self containing vs Java application (which have tons of Jar files attached). Access to the data on the HDFS thing can happen through NFSv3 so, we can go that route. There is also a REST API to it (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html) Tell me what you think! Phil |
Having Pharo playing there seems to be a good place to show its strengths.
However, you mentioned too many acronyms of technologies I don't understand (but hear a lot about). The only thing I can agree with is that the self-contained nature of Pharo is a true advantage when deploying extra nodes. It's not only fast, but also pretty lightweight compared with behemoths such as Java. Regards, Esteban A. Maringolo 2015-04-29 9:57 GMT-03:00 [hidden email] <[hidden email]>: > I am involved in some Hadoop deployments and there is a very interesting > possiblity for Pharo in that ecosystem. > > Namely, there is a YARN thing in there which is a scheduler for distributing > computing on a cluster of nodes. > > It is possible to deploy all kinds of technologies on the nodes (e.g. > Python, R, Java) and Pharo images and VM (in headless mode) could be > deployed as well. > > The deployed node can communicate back to what is called an > AppllicationManager via REST callbacks (easy game in Pharo). There is also a > C API (now, this is FFI or a plugin - > http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html) > > There is also an Hadoop component named ZooKeeper that focuses on acting as > a distributed configuration repository. > > One can talk to it with REST too > (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest) > > Given the fact that we also can use some Java calls (using the JNI module > with 32-bits Java), we can integrate well enough on YARN I'd say. > > There is also another project which is very nice and this is SLIDER (on > YARN). > This is about deploying stuff in an elastic way, (see > http://slider.incubator.apache.org/) > > The next logical thing is to have docker containers (containing a pharo > stack) deployed dynamically on the cluster using Slider (like this: > http://www.slideshare.net/hortonworks/docker-on-slider-45493303) > > First step here would be to have a basic YARN-Pharo application and a PoC > for talking to ZooKeeper. > > This would open interesting gates for Pharo given its strengths. > Even more when we'll get a 64-bit VM. > > What is cool with Pharo is that an image can be very small and self > containing vs Java application (which have tons of Jar files attached). > > Access to the data on the HDFS thing can happen through NFSv3 so, we can go > that route. > There is also a REST API to it > (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html) > > Tell me what you think! > > Phil > |
And (as I've picked up listening to other conversations) with Sista doing hotspot optimisation *in-image* deployed images will be able to start hot rather than taking cycles to determine hot spots after startup.
cheers -ben On Wed, Apr 29, 2015 at 9:55 PM, Esteban A. Maringolo <[hidden email]> wrote: Having Pharo playing there seems to be a good place to show its strengths. |
In reply to this post by philippeback
Definitly interesting!
|
For getting a start on this, one can download this:
There is all of Hadoop stuff in there, including YARN, ZooKeeper etc. I'll start doing a YARN app to run one Pharo node on the cluster and move from there. One done, more nodes. Then REST callbacks. At one point, Pharo in a docker container deployed. Here is how to write a YARN application: Phil On Thu, Apr 30, 2015 at 9:32 AM, Marcus Denker <[hidden email]> wrote:
|
In reply to this post by philippeback
Phil
I proposed to the head of the engineer to see if one of the guys working on databases could be interested by a one month project. Now I have no idea if they will like it/have the time. Stef Le 29/4/15 14:57, [hidden email] a écrit : > I am involved in some Hadoop deployments and there is a very > interesting possiblity for Pharo in that ecosystem. > > Namely, there is a YARN thing in there which is a scheduler for > distributing computing on a cluster of nodes. > > It is possible to deploy all kinds of technologies on the nodes (e.g. > Python, R, Java) and Pharo images and VM (in headless mode) could be > deployed as well. > > The deployed node can communicate back to what is called an > AppllicationManager via REST callbacks (easy game in Pharo). There is > also a C API (now, this is FFI or a plugin - > http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html) > > There is also an Hadoop component named ZooKeeper that focuses on > acting as a distributed configuration repository. > > One can talk to it with REST too > (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest) > > Given the fact that we also can use some Java calls (using the JNI > module with 32-bits Java), we can integrate well enough on YARN I'd say. > > There is also another project which is very nice and this is SLIDER > (on YARN). > This is about deploying stuff in an elastic way, (see > http://slider.incubator.apache.org/) > > The next logical thing is to have docker containers (containing a > pharo stack) deployed dynamically on the cluster using Slider (like > this: http://www.slideshare.net/hortonworks/docker-on-slider-45493303) > > First step here would be to have a basic YARN-Pharo application and a > PoC for talking to ZooKeeper. > > This would open interesting gates for Pharo given its strengths. > Even more when we'll get a 64-bit VM. > > What is cool with Pharo is that an image can be very small and self > containing vs Java application (which have tons of Jar files attached). > > Access to the data on the HDFS thing can happen through NFSv3 so, we > can go that route. > There is also a REST API to it > (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html) > > Tell me what you think! > > Phil > |
fingers crossed. Le 30 avr. 2015 10:45, "stepharo" <[hidden email]> a écrit :
Phil |
Free forum by Nabble | Edit this page |