OpenCL bindings for Squeak

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

OpenCL bindings for Squeak

Josh Gargus
For the last month or so, I've been working on some OpenCL bindings for Squeak.  I now have something that's worth showing to people.  

The code is on Squeak source at:
        http://www.squeaksource.com/OpenCL.html

and some rudimentary installation and usage documentation is at:
        https://sites.google.com/site/schwaj/home/opencl-binding-for-squeak

I'll write more over the weekend (hopefully in response to questions and comments), but now it's time for bed.

Cheers,
Josh
Reply | Threaded
Open this post in threaded view
|

Re: OpenCL bindings for Squeak

Igor Stasenko
On 20 February 2010 13:56, Josh Gargus <[hidden email]> wrote:

> For the last month or so, I've been working on some OpenCL bindings for Squeak.  I now have something that's worth showing to people.
>
> The code is on Squeak source at:
>        http://www.squeaksource.com/OpenCL.html
>
> and some rudimentary installation and usage documentation is at:
>        https://sites.google.com/site/schwaj/home/opencl-binding-for-squeak
>
> I'll write more over the weekend (hopefully in response to questions and comments), but now it's time for bed.
>
Cool stuff!
I never get my hands to GPU programming.. used OpenGL mostly.

> Cheers,
> Josh
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Josh Gargus
In reply to this post by Josh Gargus
While I was hacking away on my OpenCL bindings, I was thinking about what kind of small, fun demos I could include.  When I was first exposed to Squeak, one of the things that hooked me were the various Morphic demos, like curved text, bouncing-atoms, and the magnifier morph, all with the source code right there to learn from.  Jeff Pierce's wonderful port of Alice did the same thing for 3D.

We're at the beginning of a new era in computing, where a $1000 laptop has a CPU with 4 cores and a GPU with dozens.  What will be the new demos that catch the imagination of teenage Squeakers that are growing up with such computers?

The most obvious idea is to integrate Yoshiki's Kedama with OpenCL.  Conceptually, this seems to be a perfect fit, and I think that it would be a lot of fun.  Anybody interested in working on this with me?  Yoshiki?

I have some other ideas, but now I'm looking for yours.  I know that the interests of the Squeak community are broad, and I'm interested in hearing your ideas for small demos that communicate the power and flexibility of our system.

Cheers,
Josh


On Feb 20, 2010, at 3:56 AM, Josh Gargus wrote:

> For the last month or so, I've been working on some OpenCL bindings for Squeak.  I now have something that's worth showing to people.  
>
> The code is on Squeak source at:
> http://www.squeaksource.com/OpenCL.html
>
> and some rudimentary installation and usage documentation is at:
> https://sites.google.com/site/schwaj/home/opencl-binding-for-squeak
>
> I'll write more over the weekend (hopefully in response to questions and comments), but now it's time for bed.
>
> Cheers,
> Josh


Reply | Threaded
Open this post in threaded view
|

Re: OpenCL bindings for Squeak

Josh Gargus
In reply to this post by Igor Stasenko

On Feb 20, 2010, at 4:28 AM, Igor Stasenko wrote:

> On 20 February 2010 13:56, Josh Gargus <[hidden email]> wrote:
>> For the last month or so, I've been working on some OpenCL bindings for Squeak.  I now have something that's worth showing to people.
>>
>> The code is on Squeak source at:
>>        http://www.squeaksource.com/OpenCL.html
>>
>> and some rudimentary installation and usage documentation is at:
>>        https://sites.google.com/site/schwaj/home/opencl-binding-for-squeak
>>
>> I'll write more over the weekend (hopefully in response to questions and comments), but now it's time for bed.
>>
> Cool stuff!
> I never get my hands to GPU programming.. used OpenGL mostly.

Me too, although I've been reading quite a few academic papers recently, so I have some idea about how folks are using GPGPU, at least in the realm of computer graphics.

I've tried to get into it a few times before, but I end up not having fun spending my free time in C++.  Finally I bit the bullet and did this.

Speaking of OpenGL, one thing that's missing is the ability to share data between OpenGL and OpenCL.  C APIs exists for this, but I haven't integrated them yet.

Cheers,
Josh



>
>> Cheers,
>> Josh
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>


Reply | Threaded
Open this post in threaded view
|

Re: KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Ken G. Brown
In reply to this post by Josh Gargus
Something like this would be very cool:
<http://www.darwinathome.org/>
<http://www.darwinathome.org/blog/>

The code is in Java tho, but I believe available.

Gerald de Jong's work with tensegrity has been remarkable.

Ken G. Brown

At 2:03 PM -0800 2/20/10, Josh Gargus apparently wrote:

>While I was hacking away on my OpenCL bindings, I was thinking about what kind of small, fun demos I could include.  When I was first exposed to Squeak, one of the things that hooked me were the various Morphic demos, like curved text, bouncing-atoms, and the magnifier morph, all with the source code right there to learn from.  Jeff Pierce's wonderful port of Alice did the same thing for 3D.
>
>We're at the beginning of a new era in computing, where a $1000 laptop has a CPU with 4 cores and a GPU with dozens.  What will be the new demos that catch the imagination of teenage Squeakers that are growing up with such computers?
>
>The most obvious idea is to integrate Yoshiki's Kedama with OpenCL.  Conceptually, this seems to be a perfect fit, and I think that it would be a lot of fun.  Anybody interested in working on this with me?  Yoshiki?
>
>I have some other ideas, but now I'm looking for yours.  I know that the interests of the Squeak community are broad, and I'm interested in hearing your ideas for small demos that communicate the power and flexibility of our system.
>
>Cheers,
>Josh
>
>
>On Feb 20, 2010, at 3:56 AM, Josh Gargus wrote:
>
>> For the last month or so, I've been working on some OpenCL bindings for Squeak.  I now have something that's worth showing to people.
>>
>> The code is on Squeak source at:
>> http://www.squeaksource.com/OpenCL.html
>>
>> and some rudimentary installation and usage documentation is at:
>> https://sites.google.com/site/schwaj/home/opencl-binding-for-squeak
>>
>> I'll write more over the weekend (hopefully in response to questions and comments), but now it's time for bed.
>>
>> Cheers,
>> Josh


Reply | Threaded
Open this post in threaded view
|

Re: KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Yoshiki Ohshima-2
In reply to this post by Josh Gargus
At Sat, 20 Feb 2010 14:03:37 -0800,
Josh Gargus wrote:
>
> While I was hacking away on my OpenCL bindings, I was thinking about what kind of small, fun demos I could include.  When I was first exposed to Squeak, one of the things that hooked me were the various Morphic demos, like curved text, bouncing-atoms, and the magnifier morph, all with the source code right there to learn from.  Jeff Pierce's wonderful port of Alice did the same thing for 3D.
>
> We're at the beginning of a new era in computing, where a $1000 laptop has a CPU with 4 cores and a GPU with dozens.  What will be the new demos that catch the imagination of teenage Squeakers that are growing up with such computers?
>
> The most obvious idea is to integrate Yoshiki's Kedama with OpenCL.  Conceptually, this seems to be a perfect fit, and I think that it would be a lot of fun.  Anybody interested in working on this with me?  Yoshiki?

  Ah, cool.  Incidentally, I am working on an array processing object
model and language that is supposedly a bit more generalized than
Kedama, and someday I want to hook that up with GPUs.

  But for the starter, the existing Kedama is probably easier to adapt
and at least take some advantage of the vector/stream processing.

> I have some other ideas, but now I'm looking for yours.  I know that the interests of the Squeak community are broad, and I'm interested in hearing your ideas for small demos that communicate the power and flexibility of our system.

  As for flexibility, also one of Kedama's points as well, would be to
be able to dynamically modify the behavior of particles at runtime.  I
haven't done my homework yet, but what would be the strategy for doing
dynamic code change?

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Josh Gargus
In reply to this post by Ken G. Brown

On Feb 20, 2010, at 2:18 PM, Ken G. Brown wrote:

> Something like this would be very cool:
> <http://www.darwinathome.org/>
> <http://www.darwinathome.org/blog/>


A friend of mine does stuff like this at Caltech.  

Unfortunately, this isn't very well suited to parallel computation (at least not in the sense of computing the fitness of a large number of genomes in parallel).  However, there's a lot of physical simulation necessary over an organism's life, and this could be accelerated.  I'm not sure how each organism's AI works (didn't look in enough detail)... if it's something like a neural network, this could also be accelerated.

Not exactly a small project!


>
> The code is in Java tho, but I believe available.
>
> Gerald de Jong's work with tensegrity has been remarkable.


Cool, I haven't looked at that stuff in a long time.

Thanks,
Josh



>
> Ken G. Brown
>
> At 2:03 PM -0800 2/20/10, Josh Gargus apparently wrote:
>> While I was hacking away on my OpenCL bindings, I was thinking about what kind of small, fun demos I could include.  When I was first exposed to Squeak, one of the things that hooked me were the various Morphic demos, like curved text, bouncing-atoms, and the magnifier morph, all with the source code right there to learn from.  Jeff Pierce's wonderful port of Alice did the same thing for 3D.
>>
>> We're at the beginning of a new era in computing, where a $1000 laptop has a CPU with 4 cores and a GPU with dozens.  What will be the new demos that catch the imagination of teenage Squeakers that are growing up with such computers?
>>
>> The most obvious idea is to integrate Yoshiki's Kedama with OpenCL.  Conceptually, this seems to be a perfect fit, and I think that it would be a lot of fun.  Anybody interested in working on this with me?  Yoshiki?
>>
>> I have some other ideas, but now I'm looking for yours.  I know that the interests of the Squeak community are broad, and I'm interested in hearing your ideas for small demos that communicate the power and flexibility of our system.
>>
>> Cheers,
>> Josh
>>
>>
>> On Feb 20, 2010, at 3:56 AM, Josh Gargus wrote:
>>
>>> For the last month or so, I've been working on some OpenCL bindings for Squeak.  I now have something that's worth showing to people.
>>>
>>> The code is on Squeak source at:
>>> http://www.squeaksource.com/OpenCL.html
>>>
>>> and some rudimentary installation and usage documentation is at:
>>> https://sites.google.com/site/schwaj/home/opencl-binding-for-squeak
>>>
>>> I'll write more over the weekend (hopefully in response to questions and comments), but now it's time for bed.
>>>
>>> Cheers,
>>> Josh
>
>


Reply | Threaded
Open this post in threaded view
|

Re: KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Josh Gargus
In reply to this post by Yoshiki Ohshima-2
On Feb 20, 2010, at 2:48 PM, Yoshiki Ohshima wrote:

> At Sat, 20 Feb 2010 14:03:37 -0800,
> Josh Gargus wrote:
>>
>> While I was hacking away on my OpenCL bindings, I was thinking about what kind of small, fun demos I could include.  When I was first exposed to Squeak, one of the things that hooked me were the various Morphic demos, like curved text, bouncing-atoms, and the magnifier morph, all with the source code right there to learn from.  Jeff Pierce's wonderful port of Alice did the same thing for 3D.
>>
>> We're at the beginning of a new era in computing, where a $1000 laptop has a CPU with 4 cores and a GPU with dozens.  What will be the new demos that catch the imagination of teenage Squeakers that are growing up with such computers?
>>
>> The most obvious idea is to integrate Yoshiki's Kedama with OpenCL.  Conceptually, this seems to be a perfect fit, and I think that it would be a lot of fun.  Anybody interested in working on this with me?  Yoshiki?
>
>  Ah, cool.  Incidentally, I am working on an array processing object
> model and language that is supposedly a bit more generalized than
> Kedama, and someday I want to hook that up with GPUs.


Great!  I just downloaded your dissertation today... is it still authoritative, or are there some aspects of the system that are covered better in subsequent documents?


>
>  But for the starter, the existing Kedama is probably easier to adapt
> and at least take some advantage of the vector/stream processing.
>
>> I have some other ideas, but now I'm looking for yours.  I know that the interests of the Squeak community are broad, and I'm interested in hearing your ideas for small demos that communicate the power and flexibility of our system.
>
>  As for flexibility, also one of Kedama's points as well, would be to
> be able to dynamically modify the behavior of particles at runtime.  I
> haven't done my homework yet, but what would be the strategy for doing
> dynamic code change?


In current GPGPU architectures, execution is most efficient when items in the same "work group" follow the same code path.   For example, say that you have particles representing ants that have 10 possible different behaviors specified by an integer from 1-10 (and for simplicity, say that each of these behaviors takes 1000 clock cycles to run).  Further, let's say that you naively write this as a switch-statement in the OpenCL code... a different code-path is dynamically chosen depending on the behavior-index for that ant.  Current architectures are inefficient in the case where ants in the same work-group take different branches through the code.  If all ants have the same behavior, it will take 1000 clock cycles.  If the ants use 2 or the possible 10 behaviors, it will take 2000 clock cycles.  In the worst cast (ants use all 10 behaviors) then it will take 10000 clock cycles.

The GPU can execute multiple work-groups at the same time (approximately 16 today).  So, if you have some way of grouping ants with the same current behavior into the same work-group, then you can improve efficiency greatly compared to assigning them randomly to workgroup.  Of course, this assignment will have overhead.

The above assumes that all behaviors are already known.  You're probably also interested in code-generation.  To do this, you could synthesize a String containing the new source-code that you want to use, and upload the compiled code before running the next iteration of the simulation.  There's currently no way to generate binary code.  There's no fundamental technical reason for this, but OpenCL is immature at this point, and it will be years before the vendors can agree upon a suitable format.

Cheers,
Josh




>
> -- Yoshiki


Reply | Threaded
Open this post in threaded view
|

Re: KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Yoshiki Ohshima-2
At Sat, 20 Feb 2010 16:54:19 -0800,
Josh Gargus wrote:

>
> On Feb 20, 2010, at 2:48 PM, Yoshiki Ohshima wrote:
>
> > At Sat, 20 Feb 2010 14:03:37 -0800,
> > Josh Gargus wrote:
> >>
> >> While I was hacking away on my OpenCL bindings, I was thinking about what kind of small, fun demos I could include.  When I was first exposed to Squeak, one of the things that hooked me were the various Morphic demos, like curved text, bouncing-atoms, and the magnifier morph, all with the source code right there to learn from.  Jeff Pierce's wonderful port of Alice did the same thing for 3D.
> >>
> >> We're at the beginning of a new era in computing, where a $1000 laptop has a CPU with 4 cores and a GPU with dozens.  What will be the new demos that catch the imagination of teenage Squeakers that are growing up with such computers?
> >>
> >> The most obvious idea is to integrate Yoshiki's Kedama with OpenCL.  Conceptually, this seems to be a perfect fit, and I think that it would be a lot of fun.  Anybody interested in working on this with me?  Yoshiki?
> >
> >  Ah, cool.  Incidentally, I am working on an array processing object
> > model and language that is supposedly a bit more generalized than
> > Kedama, and someday I want to hook that up with GPUs.
>
>
> Great!  I just downloaded your dissertation today... is it still authoritative, or are there some aspects of the system that are covered better in subsequent documents?

  It matches fairly ok with the implementation.  The most relevant
part in this context is the plugin code but it is not explained
anywhere though.

> >  As for flexibility, also one of Kedama's points as well, would be to
> > be able to dynamically modify the behavior of particles at runtime.  I
> > haven't done my homework yet, but what would be the strategy for doing
> > dynamic code change?
>
>
> In current GPGPU architectures, execution is most efficient when
> items in the same "work group" follow the same code path.   For
> example, say that you have particles representing ants that have 10
> possible different behaviors specified by an integer from 1-10 (and
> for simplicity, say that each of these behaviors takes 1000 clock
> cycles to run).  Further, let's say that you naively write this as a
> switch-statement in the OpenCL code... a different code-path is
> dynamically chosen depending on the behavior-index for that ant.
> Current architectures are inefficient in the case where ants in the
> same work-group take different branches through the code.  If all
> ants have the same behavior, it will take 1000 clock cycles.  If the
> ants use 2 or the possible 10 behaviors, it will take 2000 clock
> cycles.  In the worst cast (ants use all 10 behaviors) then it will
> take 10000 clock cycles.

  Right.  I had gone through some CUDA documents and this part appears
the same.

> The GPU can execute multiple work-groups at the same time
> (approximately 16 today).  So, if you have some way of grouping ants
> with the same current behavior into the same work-group, then you
> can improve efficiency greatly compared to assigning them randomly
> to workgroup.  Of course, this assignment will have overhead.
>
> The above assumes that all behaviors are already known.  You're
> probably also interested in code-generation.  To do this, you could
> synthesize a String containing the new source-code that you want to
> use, and upload the compiled code before running the next iteration
> of the simulation.  There's currently no way to generate binary
> code.  There's no fundamental technical reason for this, but OpenCL
> is immature at this point, and it will be years before the vendors
> can agree upon a suitable format.

  Some form of code generation but just having a fixed set of
primitives, and calling them from some "interpreted code" calls them
would be a workable strategy.  Most of deviated behavior are kind of
selective write back; a line of expression is executed for all turtles
but the final assignment is masked by a boolean vector.

  There was automatic sequentialization when potentially multiple
turtles are writing into the same variable at one "step".  This was
needed semantically.  I wonder if this automatic (and somewhat eager)
serial execution is good for a GPU implementation or not.

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Josh Gargus

On Feb 21, 2010, at 11:52 AM, Yoshiki Ohshima wrote:

> At Sat, 20 Feb 2010 16:54:19 -0800,
> Josh Gargus wrote:
>>
>> On Feb 20, 2010, at 2:48 PM, Yoshiki Ohshima wrote:
>>
>>> At Sat, 20 Feb 2010 14:03:37 -0800,
>>> Josh Gargus wrote:
>>>>
>>>> While I was hacking away on my OpenCL bindings, I was thinking about what kind of small, fun demos I could include.  When I was first exposed to Squeak, one of the things that hooked me were the various Morphic demos, like curved text, bouncing-atoms, and the magnifier morph, all with the source code right there to learn from.  Jeff Pierce's wonderful port of Alice did the same thing for 3D.
>>>>
>>>> We're at the beginning of a new era in computing, where a $1000 laptop has a CPU with 4 cores and a GPU with dozens.  What will be the new demos that catch the imagination of teenage Squeakers that are growing up with such computers?
>>>>
>>>> The most obvious idea is to integrate Yoshiki's Kedama with OpenCL.  Conceptually, this seems to be a perfect fit, and I think that it would be a lot of fun.  Anybody interested in working on this with me?  Yoshiki?
>>>
>>> Ah, cool.  Incidentally, I am working on an array processing object
>>> model and language that is supposedly a bit more generalized than
>>> Kedama, and someday I want to hook that up with GPUs.
>>
>>
>> Great!  I just downloaded your dissertation today... is it still authoritative, or are there some aspects of the system that are covered better in subsequent documents?
>
>  It matches fairly ok with the implementation.  The most relevant
> part in this context is the plugin code but it is not explained
> anywhere though.


At least the expert is available. :-)


>
>>> As for flexibility, also one of Kedama's points as well, would be to
>>> be able to dynamically modify the behavior of particles at runtime.  I
>>> haven't done my homework yet, but what would be the strategy for doing
>>> dynamic code change?
>>
>>
>> In current GPGPU architectures, execution is most efficient when
>> items in the same "work group" follow the same code path.   For
>> example, say that you have particles representing ants that have 10
>> possible different behaviors specified by an integer from 1-10 (and
>> for simplicity, say that each of these behaviors takes 1000 clock
>> cycles to run).  Further, let's say that you naively write this as a
>> switch-statement in the OpenCL code... a different code-path is
>> dynamically chosen depending on the behavior-index for that ant.
>> Current architectures are inefficient in the case where ants in the
>> same work-group take different branches through the code.  If all
>> ants have the same behavior, it will take 1000 clock cycles.  If the
>> ants use 2 or the possible 10 behaviors, it will take 2000 clock
>> cycles.  In the worst cast (ants use all 10 behaviors) then it will
>> take 10000 clock cycles.
>
>  Right.  I had gone through some CUDA documents and this part appears
> the same.


Yes, it's built into the hardware.


>
>> The GPU can execute multiple work-groups at the same time
>> (approximately 16 today).  So, if you have some way of grouping ants
>> with the same current behavior into the same work-group, then you
>> can improve efficiency greatly compared to assigning them randomly
>> to workgroup.  Of course, this assignment will have overhead.
>>
>> The above assumes that all behaviors are already known.  You're
>> probably also interested in code-generation.  To do this, you could
>> synthesize a String containing the new source-code that you want to
>> use, and upload the compiled code before running the next iteration
>> of the simulation.  There's currently no way to generate binary
>> code.  There's no fundamental technical reason for this, but OpenCL
>> is immature at this point, and it will be years before the vendors
>> can agree upon a suitable format.
>
>  Some form of code generation but just having a fixed set of
> primitives, and calling them from some "interpreted code" calls them
> would be a workable strategy.  Most of deviated behavior are kind of
> selective write back; a line of expression is executed for all turtles
> but the final assignment is masked by a boolean vector.


That sound very doable.


>
>  There was automatic sequentialization when potentially multiple
> turtles are writing into the same variable at one "step".  This was
> needed semantically.  I wonder if this automatic (and somewhat eager)
> serial execution is good for a GPU implementation or not.


Nope, it isn't.  Why is it necessary for the semantics?  Which part of your dissertation describes this?

Cheers,
Josh



>
> -- Yoshiki


Reply | Threaded
Open this post in threaded view
|

Re: KedamaGPU, etc. (was: "OpenCL bindings for Squeak")

Yoshiki Ohshima-2
At Sun, 21 Feb 2010 13:31:00 -0800,
Josh Gargus wrote:
>
> >  There was automatic sequentialization when potentially multiple
> > turtles are writing into the same variable at one "step".  This was
> > needed semantically.  I wonder if this automatic (and somewhat eager)
> > serial execution is good for a GPU implementation or not.
>
> Nope, it isn't.  Why is it necessary for the semantics?  Which part of your dissertation describes this?

  I have a strong mental block toe look at it, but to avoid
introducing the explicit "reduce" operator, or "do in order" block,
but make the system do the reasonable thing was the goal.

  Let us say if we have bunch of turtles and there is a line:

  patch's value | increase by | 1

to make a map of turtles (each patch cell should hold the number of
turtles at the cell), it should just work without needing the
programmer to resolve the read-write dependency explicitly.

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: OpenCL bindings for Squeak

askoh
Administrator
In reply to this post by Josh Gargus
Can OpenCL run on a single core machine without any GPU? No parallel programming is expected. Just wanting to run the code.

Thanks,
Aik-Siong Koh
Reply | Threaded
Open this post in threaded view
|

Re: OpenCL bindings for Squeak

Josh Gargus
On Windows, AMD's OpenCL implementation supports execution on both the GPU and CPU.  Not sure if it will work if you have an Intel processor, or no GPU (no reason I can see that it wouldn't, but they might disable it for some reason).  I use Nvidia's implementation, which doesn't support CPU execution.

Cheers,
Josh


On Jun 7, 2010, at 2:25 PM, askoh wrote:

>
> Can OpenCL run on a single core machine without any GPU? No parallel
> programming is expected. Just wanting to run the code.
>
> Thanks,
> Aik-Siong Koh
> --
> View this message in context: http://forum.world.st/OpenCL-bindings-for-Squeak-tp1562770p2246602.html
> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>