Using VirtualGPU

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Using VirtualGPU

SergeStinckwich
Dear all,

just to let you know, Cheikhou (in CC) is starting a student
internship in my lab.
He will work on Epidemiology Modelling with KENDRICK:
http://ummisco.github.io/kendrick/
the platform that we are developing in order to analyse and visualise
diseases models behaviours.

We would like first to implement a GPU version of the Gillespie
Stochastic Simulation Algorithm (GSSA):
http://en.wikipedia.org/wiki/Gillespie_algorithm and after that also
implement SPH simulations:
https://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics

We are looking at the code of OpenCL and VirtualGPU done by Ronnie.
What we have understand until now :
- OpenCL package : low-level stuff to be able to interface OpenCL
kernels with Pharo
- VirtualGPU: high-level API on top of OpenCL in order to ease the
task of people who wants to use OpenCL. VirtualGPU provide high-level
operations on matrix and image at the moment.

@Ronie: What is not clear at the moment in our mind : when you build a
VirtualGPU program with the DSL, do you have the overhead of
communications every time you execute a VirtualGPU instruction or all
the the instructions are sent at the same time and run on the GPU ?

In our context, for building a GSSA algorithm, I guess we just have to
combine same VGPU instructions (matrix computations) but for doing SPH
simulations, we will have to provide our own instructions. Is there
any documentation in order to add own kernel and instructions ?

I know that others guys at INRIA (Stéphane ?) are interested by GPU.
Is it possible to join our effort to share what we are doing ?

Regards,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/

Reply | Threaded
Open this post in threaded view
|

Re: Using VirtualGPU

Alain Busser
Hi Serge,

I made some experiments with GPU computing with JavaScript. I understood that the computings are made forever inside the GPU, so you just throw data and programs once and you let the GPU compute for you. Yet the problem is to read the data once they are computed. With webGL it seems impossible, with webCL it is possible and not easy. Hence the choice of openCL I guess.

I relate here how I could compute the powers of a Markov matrix here: http://revue.sesamath.net/spip.php?article651 (especially click on "webGL sans three.js"). I also made some experiments here: http://irem.univ-reunion.fr/spip.php?article797 (but they use three.js if I remember well)

Happy readings, and, yes, I feel interersted in these subjects

Alain

On Tue, May 12, 2015 at 7:48 PM, Serge Stinckwich <[hidden email]> wrote:
Dear all,

just to let you know, Cheikhou (in CC) is starting a student
internship in my lab.
He will work on Epidemiology Modelling with KENDRICK:
http://ummisco.github.io/kendrick/
the platform that we are developing in order to analyse and visualise
diseases models behaviours.

We would like first to implement a GPU version of the Gillespie
Stochastic Simulation Algorithm (GSSA):
http://en.wikipedia.org/wiki/Gillespie_algorithm and after that also
implement SPH simulations:
https://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics

We are looking at the code of OpenCL and VirtualGPU done by Ronnie.
What we have understand until now :
- OpenCL package : low-level stuff to be able to interface OpenCL
kernels with Pharo
- VirtualGPU: high-level API on top of OpenCL in order to ease the
task of people who wants to use OpenCL. VirtualGPU provide high-level
operations on matrix and image at the moment.

@Ronie: What is not clear at the moment in our mind : when you build a
VirtualGPU program with the DSL, do you have the overhead of
communications every time you execute a VirtualGPU instruction or all
the the instructions are sent at the same time and run on the GPU ?

In our context, for building a GSSA algorithm, I guess we just have to
combine same VGPU instructions (matrix computations) but for doing SPH
simulations, we will have to provide our own instructions. Is there
any documentation in order to add own kernel and instructions ?

I know that others guys at INRIA (Stéphane ?) are interested by GPU.
Is it possible to join our effort to share what we are doing ?

Regards,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/


Reply | Threaded
Open this post in threaded view
|

Re: Using VirtualGPU

Ronie Salgado
Hi all,

Sorry for answering very late, I am busy in ICSE where tomorrow I have to defend my poster for the Student Research Competition. The topic is profiling over the OpenCL bindings.

We are looking at the code of OpenCL and VirtualGPU done by Ronnie.
What we have understand until now :
- OpenCL package : low-level stuff to be able to interface OpenCL
kernels with Pharo
- VirtualGPU: high-level API on top of OpenCL in order to ease the
task of people who wants to use OpenCL. VirtualGPU provide high-level
operations on matrix and image at the moment.
This is correct.

@Ronie: What is not clear at the moment in our mind : when you build a
VirtualGPU program with the DSL, do you have the overhead of
communications every time you execute a VirtualGPU instruction or all
the the instructions are sent at the same time and run on the GPU ?
The DSL, is actually an abstraction over the OpenCL API. Each operation, is stored in a simple intermediate representation, that is used to call a single OpenCL kernel.

There is no overhead in terms of memory transfers between intermediate operations, because there kept in the GPU the whole. There is an overhead in terms of kernel dispatching. For example, the expression a + b * 0.5  in the VGPU DSL is interpreted as the following pseudo code:

 temp := opencl invokeKernel: 'add' a with: b.
 temp2 := opencl invokeKernel: 'mulScalar' temp with: 0.5.

A custom crafted code would do something like this:

temp := opencl invokeKernel: 'addAndMulScalar' with: a with: b with: 0.5

The VGPU does not do the latter for simplicity. Currently, it does not generate any kind of OpenCL C code. It works by composing simple functions. Perhaps, in the future I will add a code generation step for optimization.

Another problem, is the proliferation of intermediate buffers. There are some samples that avoid using intermediate buffers by using add:into: instead of +. The into buffer is just where the result is going to be placed. Look into VirtualGPUSamples >> #imageChangedForGradient, VirtualGPUSamples >> #imageChangedForGradientOptimized , VirtualGPUSamples >> #imageChangedForGradientOptimizedMore .

 
In our context, for building a GSSA algorithm, I guess we just have to
combine same VGPU instructions (matrix computations) but for doing SPH
simulations, we will have to provide our own instructions. Is there
any documentation in order to add own kernel and instructions ?

There is not documentation. We should have some tool like doxygen so that I can write the documentation when I am writing the methods.

Anyway, I will document here for now.

First of all, look at the existing kernels. For that, put this in a playground and do it (you need the GTInspector):

EmbObjectBrowser openBrowser.

That will open a browser that I use to edit the OpenCL C code. It does have some bugs, but it is better than editting a huge string in a smalltalk method. Lets look in VGPULinearAlgebraSources. There you will see the 'kernels' category and inside of it two methods: #matrixKernels' and #vectorKernels . If you look #vectorKernels , you will see just the OpenCL C code.

If you now go to Nautilus, and look the VGPULinearAlgebraSources class, you will see that it is a subclass of EmbObjectContainer. vectorKernels and matrixKernels are Smalltalk methods. vectorKernels looks like this:

vectorKernels
    <embeddedObject>
    ^ '
// Vector binary operations
__kernel void floatVector_add(__global float *left, __global float *right, __global float *result)
...
'

For an example, of actually invoking the kernel, you should look at the following methods:
VGPUFloatMatrix >> #abs
VGPUFloatMatrix >> #absInto
VGPUFloatMatrix >> #discreteGradient
VGPUFloatMatrix >> #discreteGradientInto

As for the OpenCL package, it just provides bindings for the C OpenCL API. You can also use it if you want, but your are on your own :) .

I made some experiments with GPU computing with JavaScript. I understood that the computings are made forever inside the GPU, so you just throw data and programs once and you let the GPU compute for you. Yet the problem is to read the data once they are computed. With webGL it seems impossible, with webCL it is possible and not easy. Hence the choice of openCL I guess.

I relate here how I could compute the powers of a Markov matrix here: http://revue.sesamath.net/spip.php?article651 (especially click on "webGL sans three.js"). I also made some experiments here: http://irem.univ-reunion.fr/spip.php?article797 (but they use three.js if I remember well)

I have been taking a look at those so called HTML5 WebGL technology. It does not convince me, since I am more interested in desktop application. I was thinking on making a 3D level editor in html5, js and webgl to test the technology, it seems to have many problems. The last week with Milton we found an easy way to draw a Morph into a Woden texture, so I am going to be using Pharo for the editor.

The main problem that I have with the web technologies, is the one size fit all mentality that surrounds them. Javascript as IR, the ugly asm.js hack, and the lack of support for UDP socket which any reasonable real time online game requires.

Best regards,
Ronie

2015-05-12 18:36 GMT+02:00 Alain Busser <[hidden email]>:
Hi Serge,

I made some experiments with GPU computing with JavaScript. I understood that the computings are made forever inside the GPU, so you just throw data and programs once and you let the GPU compute for you. Yet the problem is to read the data once they are computed. With webGL it seems impossible, with webCL it is possible and not easy. Hence the choice of openCL I guess.

I relate here how I could compute the powers of a Markov matrix here: http://revue.sesamath.net/spip.php?article651 (especially click on "webGL sans three.js"). I also made some experiments here: http://irem.univ-reunion.fr/spip.php?article797 (but they use three.js if I remember well)

Happy readings, and, yes, I feel interersted in these subjects

Alain

On Tue, May 12, 2015 at 7:48 PM, Serge Stinckwich <[hidden email]> wrote:
Dear all,

just to let you know, Cheikhou (in CC) is starting a student
internship in my lab.
He will work on Epidemiology Modelling with KENDRICK:
http://ummisco.github.io/kendrick/
the platform that we are developing in order to analyse and visualise
diseases models behaviours.

We would like first to implement a GPU version of the Gillespie
Stochastic Simulation Algorithm (GSSA):
http://en.wikipedia.org/wiki/Gillespie_algorithm and after that also
implement SPH simulations:
https://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics

We are looking at the code of OpenCL and VirtualGPU done by Ronnie.
What we have understand until now :
- OpenCL package : low-level stuff to be able to interface OpenCL
kernels with Pharo
- VirtualGPU: high-level API on top of OpenCL in order to ease the
task of people who wants to use OpenCL. VirtualGPU provide high-level
operations on matrix and image at the moment.

@Ronie: What is not clear at the moment in our mind : when you build a
VirtualGPU program with the DSL, do you have the overhead of
communications every time you execute a VirtualGPU instruction or all
the the instructions are sent at the same time and run on the GPU ?

In our context, for building a GSSA algorithm, I guess we just have to
combine same VGPU instructions (matrix computations) but for doing SPH
simulations, we will have to provide our own instructions. Is there
any documentation in order to add own kernel and instructions ?

I know that others guys at INRIA (Stéphane ?) are interested by GPU.
Is it possible to join our effort to share what we are doing ?

Regards,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/



Reply | Threaded
Open this post in threaded view
|

Re: Using VirtualGPU

SergeStinckwich
On Tue, May 19, 2015 at 11:30 AM, Ronie Salgado <[hidden email]> wrote:
> Hi all,
>
> Sorry for answering very late, I am busy in ICSE where tomorrow I have to
> defend my poster for the Student Research Competition. The topic is
> profiling over the OpenCL bindings.

No problem Ronie. Thank you for your reply.

Do you have a copy of your poster somewhere ?

>> We are looking at the code of OpenCL and VirtualGPU done by Ronnie.
>> What we have understand until now :
>> - OpenCL package : low-level stuff to be able to interface OpenCL
>> kernels with Pharo
>> - VirtualGPU: high-level API on top of OpenCL in order to ease the
>> task of people who wants to use OpenCL. VirtualGPU provide high-level
>> operations on matrix and image at the moment.
>
> This is correct.
>
>> @Ronie: What is not clear at the moment in our mind : when you build a
>> VirtualGPU program with the DSL, do you have the overhead of
>> communications every time you execute a VirtualGPU instruction or all
>> the the instructions are sent at the same time and run on the GPU ?
>
> The DSL, is actually an abstraction over the OpenCL API. Each operation, is
> stored in a simple intermediate representation, that is used to call a
> single OpenCL kernel.
>
> There is no overhead in terms of memory transfers between intermediate
> operations, because there kept in the GPU the whole. There is an overhead in
> terms of kernel dispatching. For example, the expression a + b * 0.5  in the
> VGPU DSL is interpreted as the following pseudo code:
>
>  temp := opencl invokeKernel: 'add' a with: b.
>  temp2 := opencl invokeKernel: 'mulScalar' temp with: 0.5.
>
> A custom crafted code would do something like this:
>
> temp := opencl invokeKernel: 'addAndMulScalar' with: a with: b with: 0.5
>
> The VGPU does not do the latter for simplicity. Currently, it does not
> generate any kind of OpenCL C code. It works by composing simple functions.
> Perhaps, in the future I will add a code generation step for optimization.

Ok, I understand. What will be the benefit in term of speed if you
doing code generation regarding
the current version ?

> Another problem, is the proliferation of intermediate buffers. There are
> some samples that avoid using intermediate buffers by using add:into:
> instead of +. The into buffer is just where the result is going to be
> placed. Look into VirtualGPUSamples >> #imageChangedForGradient,
> VirtualGPUSamples >> #imageChangedForGradientOptimized , VirtualGPUSamples
>>> #imageChangedForGradientOptimizedMore .
>
>
>>
>> In our context, for building a GSSA algorithm, I guess we just have to
>> combine same VGPU instructions (matrix computations) but for doing SPH
>> simulations, we will have to provide our own instructions. Is there
>> any documentation in order to add own kernel and instructions ?
>
>
> There is not documentation. We should have some tool like doxygen so that I
> can write the documentation when I am writing the methods.
>
> Anyway, I will document here for now.

Great ;-)

Can we start a Pillar chapter in
https://github.com/SquareBracketAssociates/PharoLimbo

> First of all, look at the existing kernels. For that, put this in a
> playground and do it (you need the GTInspector):
>
> EmbObjectBrowser openBrowser.
>
> That will open a browser that I use to edit the OpenCL C code. It does have
> some bugs, but it is better than editting a huge string in a smalltalk
> method. Lets look in VGPULinearAlgebraSources. There you will see the
> 'kernels' category and inside of it two methods: #matrixKernels' and
> #vectorKernels . If you look #vectorKernels , you will see just the OpenCL C
> code.
>
> If you now go to Nautilus, and look the VGPULinearAlgebraSources class, you
> will see that it is a subclass of EmbObjectContainer. vectorKernels and
> matrixKernels are Smalltalk methods. vectorKernels looks like this:
>
> vectorKernels
>     <embeddedObject>
>     ^ '
> // Vector binary operations
> __kernel void floatVector_add(__global float *left, __global float *right,
> __global float *result)
> ...
> '
>
> For an example, of actually invoking the kernel, you should look at the
> following methods:
> VGPUFloatMatrix >> #abs
> VGPUFloatMatrix >> #absInto
> VGPUFloatMatrix >> #discreteGradient
> VGPUFloatMatrix >> #discreteGradientInto
>
> As for the OpenCL package, it just provides bindings for the C OpenCL API.
> You can also use it if you want, but your are on your own :) .

I already understand part of that.

Thank you for help.
Regards,

--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/