Smalltalk › Pharo › Pharo Smalltalk Users

Pharo and Hadoop

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

7 messages Options

philippeback

Pharo and Hadoop

I am involved in some Hadoop deployments and there is a very interesting possiblity for Pharo in that ecosystem.

Namely, there is a YARN thing in there which is a scheduler for distributing computing on a cluster of nodes.

It is possible to deploy all kinds of technologies on the nodes (e.g. Python, R, Java) and Pharo images and VM (in headless mode) could be deployed as well.

The deployed node can communicate back to what is called an AppllicationManager via REST callbacks (easy game in Pharo). There is also a C API (now, this is FFI or a plugin - http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html)

There is also an Hadoop component named ZooKeeper that focuses on acting as a distributed configuration repository.

One can talk to it with REST too (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest)

Given the fact that we also can use some Java calls (using the JNI module with 32-bits Java), we can integrate well enough on YARN I'd say.

There is also another project which is very nice and this is SLIDER (on YARN).

This is about deploying stuff in an elastic way, (see http://slider.incubator.apache.org/)

The next logical thing is to have docker containers (containing a pharo stack) deployed dynamically on the cluster using Slider (like this: http://www.slideshare.net/hortonworks/docker-on-slider-45493303)

First step here would be to have a basic YARN-Pharo application and a PoC for talking to ZooKeeper.

This would open interesting gates for Pharo given its strengths.

Even more when we'll get a 64-bit VM.

What is cool with Pharo is that an image can be very small and self containing vs Java application (which have tons of Jar files attached).

Access to the data on the HDFS thing can happen through NFSv3 so, we can go that route.

There is also a REST API to it (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html)

Tell me what you think!

Phil

Esteban A. Maringolo

Re: Pharo and Hadoop

Having Pharo playing there seems to be a good place to show its strengths.

However, you mentioned too many acronyms of technologies I don't
understand (but hear a lot about).

The only thing I can agree with is that the self-contained nature of
Pharo is a true advantage when deploying extra nodes. It's not only
fast, but also pretty lightweight compared with behemoths such as
Java.

Regards,

Esteban A. Maringolo

2015-04-29 9:57 GMT-03:00 [hidden email] <[hidden email]>:

> I am involved in some Hadoop deployments and there is a very interesting
> possiblity for Pharo in that ecosystem.
>
> Namely, there is a YARN thing in there which is a scheduler for distributing
> computing on a cluster of nodes.
>
> It is possible to deploy all kinds of technologies on the nodes (e.g.
> Python, R, Java) and Pharo images and VM (in headless mode) could be
> deployed as well.
>
> The deployed node can communicate back to what is called an
> AppllicationManager via REST callbacks (easy game in Pharo). There is also a
> C API (now, this is FFI or a plugin -
> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html)
>
> There is also an Hadoop component named ZooKeeper that focuses on acting as
> a distributed configuration repository.
>
> One can talk to it with REST too
> (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest)
>
> Given the fact that we also can use some Java calls (using the JNI module
> with 32-bits Java), we can integrate well enough on YARN I'd say.
>
> There is also another project which is very nice and this is SLIDER (on
> YARN).
> This is about deploying stuff in an elastic way, (see
> http://slider.incubator.apache.org/)
>
> The next logical thing is to have docker containers (containing a pharo
> stack) deployed dynamically on the cluster using Slider (like this:
> http://www.slideshare.net/hortonworks/docker-on-slider-45493303)
>
> First step here would be to have a basic YARN-Pharo application and a PoC
> for talking to ZooKeeper.
>
> This would open interesting gates for Pharo given its strengths.
> Even more when we'll get a 64-bit VM.
>
> What is cool with Pharo is that an image can be very small and self
> containing vs Java application (which have tons of Jar files attached).
>
> Access to the data on the HDFS thing can happen through NFSv3 so, we can go
> that route.
> There is also a REST API to it
> (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html)
>
> Tell me what you think!
>
> Phil
>

Ben Coman

Re: Pharo and Hadoop

And (as I've picked up listening to other conversations) with Sista doing hotspot optimisation *in-image* deployed images will be able to start hot rather than taking cycles to determine hot spots after startup.

cheers -ben

On Wed, Apr 29, 2015 at 9:55 PM, Esteban A. Maringolo <[hidden email]> wrote:

Having Pharo playing there seems to be a good place to show its strengths.

However, you mentioned too many acronyms of technologies I don't
understand (but hear a lot about).

The only thing I can agree with is that the self-contained nature of
Pharo is a true advantage when deploying extra nodes. It's not only
fast, but also pretty lightweight compared with behemoths such as
Java.

Regards,

Esteban A. Maringolo

2015-04-29 9:57 GMT-03:00 [hidden email] <[hidden email]>:

> I am involved in some Hadoop deployments and there is a very interesting
> possiblity for Pharo in that ecosystem.
>
> Namely, there is a YARN thing in there which is a scheduler for distributing
> computing on a cluster of nodes.
>
> It is possible to deploy all kinds of technologies on the nodes (e.g.
> Python, R, Java) and Pharo images and VM (in headless mode) could be
> deployed as well.
>
> The deployed node can communicate back to what is called an
> AppllicationManager via REST callbacks (easy game in Pharo). There is also a
> C API (now, this is FFI or a plugin -
> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html)
>
> There is also an Hadoop component named ZooKeeper that focuses on acting as
> a distributed configuration repository.
>
> One can talk to it with REST too
> (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest)
>
> Given the fact that we also can use some Java calls (using the JNI module
> with 32-bits Java), we can integrate well enough on YARN I'd say.
>
> There is also another project which is very nice and this is SLIDER (on
> YARN).
> This is about deploying stuff in an elastic way, (see
> http://slider.incubator.apache.org/)
>
> The next logical thing is to have docker containers (containing a pharo
> stack) deployed dynamically on the cluster using Slider (like this:
> http://www.slideshare.net/hortonworks/docker-on-slider-45493303)
>
> First step here would be to have a basic YARN-Pharo application and a PoC
> for talking to ZooKeeper.
>
> This would open interesting gates for Pharo given its strengths.
> Even more when we'll get a 64-bit VM.
>
> What is cool with Pharo is that an image can be very small and self
> containing vs Java application (which have tons of Jar files attached).
>
> Access to the data on the HDFS thing can happen through NFSv3 so, we can go
> that route.
> There is also a REST API to it
> (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html)
>
> Tell me what you think!
>
> Phil
>

Marcus Denker-4

Re: Pharo and Hadoop

In reply to this post by philippeback

Definitly interesting!

On 29 Apr 2015, at 14:57, [hidden email] wrote:

I am involved in some Hadoop deployments and there is a very interesting possiblity for Pharo in that ecosystem.

Namely, there is a YARN thing in there which is a scheduler for distributing computing on a cluster of nodes.

It is possible to deploy all kinds of technologies on the nodes (e.g. Python, R, Java) and Pharo images and VM (in headless mode) could be deployed as well.

The deployed node can communicate back to what is called an AppllicationManager via REST callbacks (easy game in Pharo). There is also a C API (now, this is FFI or a plugin - http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html)

There is also an Hadoop component named ZooKeeper that focuses on acting as a distributed configuration repository.

One can talk to it with REST too (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest)

Given the fact that we also can use some Java calls (using the JNI module with 32-bits Java), we can integrate well enough on YARN I'd say.

There is also another project which is very nice and this is SLIDER (on YARN).
This is about deploying stuff in an elastic way, (see http://slider.incubator.apache.org/)

The next logical thing is to have docker containers (containing a pharo stack) deployed dynamically on the cluster using Slider (like this: http://www.slideshare.net/hortonworks/docker-on-slider-45493303)

First step here would be to have a basic YARN-Pharo application and a PoC for talking to ZooKeeper.

This would open interesting gates for Pharo given its strengths.
Even more when we'll get a 64-bit VM.

What is cool with Pharo is that an image can be very small and self containing vs Java application (which have tons of Jar files attached).

Access to the data on the HDFS thing can happen through NFSv3 so, we can go that route.
There is also a REST API to it (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html)

Tell me what you think!

Phil

philippeback

Re: Pharo and Hadoop

For getting a start on this, one can download this:

http://hortonworks.com/products/hortonworks-sandbox/

There is all of Hadoop stuff in there, including YARN, ZooKeeper etc.

I'll start doing a YARN app to run one Pharo node on the cluster and move from there.

One done, more nodes.

Then REST callbacks.

At one point, Pharo in a docker container deployed.

Here is how to write a YARN application:

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html

Phil

On Thu, Apr 30, 2015 at 9:32 AM, Marcus Denker <[hidden email]> wrote:

Definitly interesting!

On 29 Apr 2015, at 14:57, [hidden email] wrote:

I am involved in some Hadoop deployments and there is a very interesting possiblity for Pharo in that ecosystem.

Namely, there is a YARN thing in there which is a scheduler for distributing computing on a cluster of nodes.

It is possible to deploy all kinds of technologies on the nodes (e.g. Python, R, Java) and Pharo images and VM (in headless mode) could be deployed as well.

The deployed node can communicate back to what is called an AppllicationManager via REST callbacks (easy game in Pharo). There is also a C API (now, this is FFI or a plugin - http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html)

There is also an Hadoop component named ZooKeeper that focuses on acting as a distributed configuration repository.

One can talk to it with REST too (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest)

Given the fact that we also can use some Java calls (using the JNI module with 32-bits Java), we can integrate well enough on YARN I'd say.

There is also another project which is very nice and this is SLIDER (on YARN).
This is about deploying stuff in an elastic way, (see http://slider.incubator.apache.org/)

The next logical thing is to have docker containers (containing a pharo stack) deployed dynamically on the cluster using Slider (like this: http://www.slideshare.net/hortonworks/docker-on-slider-45493303)

First step here would be to have a basic YARN-Pharo application and a PoC for talking to ZooKeeper.

This would open interesting gates for Pharo given its strengths.
Even more when we'll get a 64-bit VM.

What is cool with Pharo is that an image can be very small and self containing vs Java application (which have tons of Jar files attached).

Access to the data on the HDFS thing can happen through NFSv3 so, we can go that route.
There is also a REST API to it (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html)

Tell me what you think!

Phil

stepharo

Re: Pharo and Hadoop

In reply to this post by philippeback

Phil

I proposed to the head of the engineer to see if one of the guys working
on databases could be interested
by a one month project. Now I have no idea if they will like it/have the
time.

Stef

Le 29/4/15 14:57, [hidden email] a écrit :

> I am involved in some Hadoop deployments and there is a very
> interesting possiblity for Pharo in that ecosystem.
>
> Namely, there is a YARN thing in there which is a scheduler for
> distributing computing on a cluster of nodes.
>
> It is possible to deploy all kinds of technologies on the nodes (e.g.
> Python, R, Java) and Pharo images and VM (in headless mode) could be
> deployed as well.
>
> The deployed node can communicate back to what is called an
> AppllicationManager via REST callbacks (easy game in Pharo). There is
> also a C API (now, this is FFI or a plugin -
> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html)
>
> There is also an Hadoop component named ZooKeeper that focuses on
> acting as a distributed configuration repository.
>
> One can talk to it with REST too
> (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest)
>
> Given the fact that we also can use some Java calls (using the JNI
> module with 32-bits Java), we can integrate well enough on YARN I'd say.
>
> There is also another project which is very nice and this is SLIDER
> (on YARN).
> This is about deploying stuff in an elastic way, (see
> http://slider.incubator.apache.org/)
>
> The next logical thing is to have docker containers (containing a
> pharo stack) deployed dynamically on the cluster using Slider (like
> this: http://www.slideshare.net/hortonworks/docker-on-slider-45493303)
>
> First step here would be to have a basic YARN-Pharo application and a
> PoC for talking to ZooKeeper.
>
> This would open interesting gates for Pharo given its strengths.
> Even more when we'll get a 64-bit VM.
>
> What is cool with Pharo is that an image can be very small and self
> containing vs Java application (which have tons of Jar files attached).
>
> Access to the data on the HDFS thing can happen through NFSv3 so, we
> can go that route.
> There is also a REST API to it
> (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html)
>
> Tell me what you think!
>
> Phil
>

philippeback

Re: Pharo and Hadoop

fingers crossed.

Le 30 avr. 2015 10:45, "stepharo" <[hidden email]> a écrit :

Phil

I proposed to the head of the engineer to see if one of the guys working on databases could be interested
by a one month project. Now I have no idea if they will like it/have the time.

Stef

Le 29/4/15 14:57, [hidden email] a écrit :

I am involved in some Hadoop deployments and there is a very interesting possiblity for Pharo in that ecosystem.

Namely, there is a YARN thing in there which is a scheduler for distributing computing on a cluster of nodes.

It is possible to deploy all kinds of technologies on the nodes (e.g. Python, R, Java) and Pharo images and VM (in headless mode) could be deployed as well.

The deployed node can communicate back to what is called an AppllicationManager via REST callbacks (easy game in Pharo). There is also a C API (now, this is FFI or a plugin - http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html)

There is also an Hadoop component named ZooKeeper that focuses on acting as a distributed configuration repository.

One can talk to it with REST too (https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest)

Given the fact that we also can use some Java calls (using the JNI module with 32-bits Java), we can integrate well enough on YARN I'd say.

There is also another project which is very nice and this is SLIDER (on YARN).
This is about deploying stuff in an elastic way, (see http://slider.incubator.apache.org/)

The next logical thing is to have docker containers (containing a pharo stack) deployed dynamically on the cluster using Slider (like this: http://www.slideshare.net/hortonworks/docker-on-slider-45493303)

First step here would be to have a basic YARN-Pharo application and a PoC for talking to ZooKeeper.

This would open interesting gates for Pharo given its strengths.
Even more when we'll get a 64-bit VM.

What is cool with Pharo is that an image can be very small and self containing vs Java application (which have tons of Jar files attached).

Access to the data on the HDFS thing can happen through NFSv3 so, we can go that route.
There is also a REST API to it (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html)

Tell me what you think!

Phil