Smalltalk › Pharo › Pharo Smalltalk Users

NativeBoost : optimisation of the machine code generation

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

8 messages Options

Thomas Bany

NativeBoost : optimisation of the machine code generation

Hi everyone,

I'm trying to reduce the computation time of the following pseudo-code:

- memory allocation (~40 doubles)

- object heap to C heap copying

- NativeBoost call (nbCall:)

- memory freeing

The time profiling results are bellow:

- 24*3600 calls : > 1 minute

- 24*3600 calls with only memory allocation and copying : < 1 second

- 1 call with a 24*3600 loop inside de C code : < 1 second

So it appears that the very coslty step is the transition from Pharo to C. And I was wondering if it was possible to drasticly reduce this time by doing something like, generate the the machine code once and call it multiple time ?

Thanks in advance !

Thomas.

Luc Fabresse

Re: NativeBoost : optimisation of the machine code generation

Hi Thomas,

2014-08-07 17:25 GMT+02:00 Thomas Bany <[hidden email]>:

Hi everyone,

I'm trying to reduce the computation time of the following pseudo-code:

- memory allocation (~40 doubles)
- object heap to C heap copying

- NativeBoost call (nbCall:)
- memory freeing

The time profiling results are bellow:

- 24*3600 calls : > 1 minute
- 24*3600 calls with only memory allocation and copying : < 1 second

- 1 call with a 24*3600 loop inside de C code : < 1 second

So it appears that the very coslty step is the transition from Pharo to C. And I was wondering if it was possible to drasticly reduce this time by doing something like, generate the the machine code once and call it multiple time ?

the machine code for the marshalling of the arguments is generated one time for all.

so the penalty does not come from there.

please send the code you wrote for these micro-benchs so I can better understand what happens.

Luc

Thanks in advance !

Thomas.

abergel

Re: NativeBoost : optimisation of the machine code generation

In reply to this post by Thomas Bany

Hi Thomas,

Please share with us how it goes. Your experience is important to us.

Alexandre

> Le 07-08-2014 à 11:25, Thomas Bany <[hidden email]> a écrit :
>
> Hi everyone,
>
> I'm trying to reduce the computation time of the following pseudo-code:
>
> - memory allocation (~40 doubles)
> - object heap to C heap copying
> - NativeBoost call (nbCall:)
> - memory freeing
>
> The time profiling results are bellow:
>
> - 24*3600 calls : > 1 minute
> - 24*3600 calls with only memory allocation and copying : < 1 second
> - 1 call with a 24*3600 loop inside de C code : < 1 second
>
> So it appears that the very coslty step is the transition from Pharo to C. And I was wondering if it was possible to drasticly reduce this time by doing something like, generate the the machine code once and call it multiple time ?
>
> Thanks in advance !
>
> Thomas.

kilon.alios

Re: NativeBoost : optimisation of the machine code generation

In reply to this post by Thomas Bany

I think that if you posted the code , preferably that contains only the problem would be easier to test , debug and investigate.

On Thu, Aug 7, 2014 at 6:25 PM, Thomas Bany <[hidden email]> wrote:

Hi everyone,

I'm trying to reduce the computation time of the following pseudo-code:

- memory allocation (~40 doubles)
- object heap to C heap copying

- NativeBoost call (nbCall:)
- memory freeing

The time profiling results are bellow:

- 24*3600 calls : > 1 minute
- 24*3600 calls with only memory allocation and copying : < 1 second

- 1 call with a 24*3600 loop inside de C code : < 1 second

So it appears that the very coslty step is the transition from Pharo to C. And I was wondering if it was possible to drasticly reduce this time by doing something like, generate the the machine code once and call it multiple time ?

Thanks in advance !

Thomas.

Thomas Bany

Re: NativeBoost : optimisation of the machine code generation

@ Alexandre: sure, no problem !

@ Luc:

I'm not sure how much code I can provide without being to specific, but here is how it goes :

Let's say I have the Smalltalk code bellow:

MyClass>>withNBCall

externalArray := NBExternalArrayOfDoubles new: self internalArray size.

output := NBExternalArrayOfDoubles new: 4.

[self actualNBCallWith: externalArray adress storeResultIn: output adress] ensure: [externalArray free. output free].

MyClass>>withNBCallCommented

externalArray := NBExternalArrayOfDoubles new: self internalArray size.

output := NBExternalArrayOfDoubles new: 4.

["self actualNBCallWith: externalArray adress storeResultIn: output adress"] ensure: [externalArray free. output free].

MyClass>>actualNBCallWith: externalArray storeResultIn: output
<primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin' error: errorCode>

^self nbCall: #(void callToC(double * externalArray, double * output)) module: 'lib/myModule.dll'

And the C code bellow:

void callToC(double * externalArray, double * output) {

computationWith(externalArray, output);

}

void specialCallToC(double * externalArray, double * output) {

unsigned int i;

for (i = 0; i < 24*3600; i++)

computationWith(externalArray, output);

}

Now I have the following code typed in Time Profiler tool :

object := (MyClass new) variousInitialization; yourself

24*3600 timesRepeat: [object withNBCall]

>> Over 1 minute computation time of which over 99% are primitives. Also I don't see the nbCall: in the tree.

object := (MyClass new) variousInitialization; yourself

24*3600 timesRepeat: [object withNBCallCommented]

>> Less than 1 second.

object := (MyClass new) variousInitialization; yourself

object withNBCall

>> Less than 1 millisecond.

object := (MyClass new) variousInitialization; yourself

object withNBSpecialCall "This time, I use the specialCallToC() function"

>> Arround 20 millisecond.

Allright, that's a pile of code but I hope it help :)

On a side note:

Pharo 3, Win 7 32-bit
I'm not at work anymore and don't have my code with me. So I will double check tomorow that I didn't provided false informations but I think it's accurate of what I do.

Again, thanks for the interest on my issue !

Thomas.

2014-08-07 18:39 GMT+02:00 kilon alios <[hidden email]>:

I think that if you posted the code , preferably that contains only the problem would be easier to test , debug and investigate.

On Thu, Aug 7, 2014 at 6:25 PM, Thomas Bany <[hidden email]> wrote:

Hi everyone,

I'm trying to reduce the computation time of the following pseudo-code:

- memory allocation (~40 doubles)
- object heap to C heap copying

- NativeBoost call (nbCall:)
- memory freeing

The time profiling results are bellow:

- 24*3600 calls : > 1 minute
- 24*3600 calls with only memory allocation and copying : < 1 second

- 1 call with a 24*3600 loop inside de C code : < 1 second

So it appears that the very coslty step is the transition from Pharo to C. And I was wondering if it was possible to drasticly reduce this time by doing something like, generate the the machine code once and call it multiple time ?

Thanks in advance !

Thomas.

Thomas Bany

Re: NativeBoost : optimisation of the machine code generation

I forgot the copying of the data from the object heap to C heap:

MyClass>>withNBCall

externalArray := NBExternalArrayOfDoubles new: self internalArray size.

output := NBExternalArrayOfDoubles new: 4.

1 to: self internalArray size. do: [ :index | externalArray at: index (put: self internalArray at: index) ].

[self actualNBCallWith: externalArray adress storeResultIn: output adress] ensure: [externalArray free. output free].

MyClass>>withNBCallCommented

externalArray := NBExternalArrayOfDoubles new: self internalArray size.

output := NBExternalArrayOfDoubles new: 4.
1 to: self internalArray size. do: [ :index | externalArray at: index (put: self internalArray at: index) ].

["self actualNBCallWith: externalArray adress storeResultIn: output adress"] ensure: [externalArray free. output free].

Thomas.

2014-08-07 19:15 GMT+02:00 Thomas Bany <[hidden email]>:

@ Alexandre: sure, no problem !

@ Luc:

I'm not sure how much code I can provide without being to specific, but here is how it goes :

Let's say I have the Smalltalk code bellow:
MyClass>>withNBCall
   externalArray := NBExternalArrayOfDoubles new: self internalArray size.

   output := NBExternalArrayOfDoubles new: 4.
   [self actualNBCallWith: externalArray adress storeResultIn: output adress] ensure: [externalArray free. output free].

MyClass>>withNBCallCommented
   externalArray := NBExternalArrayOfDoubles new: self internalArray size.

   output := NBExternalArrayOfDoubles new: 4.
   ["self actualNBCallWith: externalArray adress storeResultIn: output adress"] ensure: [externalArray free. output free].

MyClass>>actualNBCallWith: externalArray storeResultIn: output
   <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin' error: errorCode>

   ^self nbCall: #(void callToC(double * externalArray, double * output)) module: 'lib/myModule.dll'

And the C code bellow:
void callToC(double * externalArray, double * output) {

   computationWith(externalArray, output);
}

void specialCallToC(double * externalArray, double * output) {

   unsigned int i;
   for (i = 0; i < 24*3600; i++)

      computationWith(externalArray, output);
}

Now I have the following code typed in Time Profiler tool :
object := (MyClass new) variousInitialization; yourself

24*3600 timesRepeat: [object withNBCall]
>> Over 1 minute computation time of which over 99% are primitives. Also I don't see the nbCall: in the tree.

object := (MyClass new) variousInitialization; yourself
24*3600 timesRepeat: [object withNBCallCommented]

>> Less than 1 second.

object := (MyClass new) variousInitialization; yourself
object withNBCall
>> Less than 1 millisecond.

object := (MyClass new) variousInitialization; yourself
object withNBSpecialCall "This time, I use the specialCallToC() function"

>> Arround 20 millisecond.

Allright, that's a pile of code but I hope it help :)

On a side note:
Pharo 3, Win 7 32-bit
I'm not at work anymore and don't have my code with me. So I will double check tomorow that I didn't provided false informations but I think it's accurate of what I do.

Again, thanks for the interest on my issue !

Thomas.

2014-08-07 18:39 GMT+02:00 kilon alios <[hidden email]>:

I think that if you posted the code , preferably that contains only the problem would be easier to test , debug and investigate.

On Thu, Aug 7, 2014 at 6:25 PM, Thomas Bany <[hidden email]> wrote:

Hi everyone,

I'm trying to reduce the computation time of the following pseudo-code:

- memory allocation (~40 doubles)
- object heap to C heap copying

- NativeBoost call (nbCall:)
- memory freeing

The time profiling results are bellow:

- 24*3600 calls : > 1 minute
- 24*3600 calls with only memory allocation and copying : < 1 second

- 1 call with a 24*3600 loop inside de C code : < 1 second

So it appears that the very coslty step is the transition from Pharo to C. And I was wondering if it was possible to drasticly reduce this time by doing something like, generate the the machine code once and call it multiple time ?

Thanks in advance !

Thomas.

stepharo

Re: NativeBoost : optimisation of the machine code generation

In reply to this post by Thomas Bany

NativeBoost methods compiles native code only once (the first time they
are executed) and when sessionId changes (because you may be on a
different platform).
So this is already like that. The assembly code is cached in the method
literal.

Stef
On 7/8/14 17:25, Thomas Bany wrote:

> Hi everyone,
>
> I'm trying to reduce the computation time of the following pseudo-code:
>
> - memory allocation (~40 doubles)
> - object heap to C heap copying
> - NativeBoost call (nbCall:)
> - memory freeing
>
> The time profiling results are bellow:
>
> - 24*3600 calls : > 1 minute
> - 24*3600 calls with only memory allocation and copying : < 1 second
> - 1 call with a 24*3600 loop inside de C code : < 1 second
>
> So it appears that the very coslty step is the transition from Pharo
> to C. And I was wondering if it was possible to drasticly reduce this
> time by doing something like, generate the the machine code once and
> call it multiple time ?
>
> Thanks in advance !
>
> Thomas.

Thomas Bany

Re: NativeBoost : optimisation of the machine code generation

Okey, I found the issue and it was me doing lazy benchmark: I had forgot a debug printing function in the C code, that I had removed between the benchs.

Thanks again for your time !

Thomas.

2014-08-07 21:56 GMT+02:00 stepharo <[hidden email]>:

NativeBoost methods compiles native code only once (the first time they are executed) and when sessionId changes (because you may be on a different platform).
So this is already like that. The assembly code is cached in the method literal.

Stef

On 7/8/14 17:25, Thomas Bany wrote:

Hi everyone,

I'm trying to reduce the computation time of the following pseudo-code:

- memory allocation (~40 doubles)
- object heap to C heap copying
- NativeBoost call (nbCall:)
- memory freeing

The time profiling results are bellow:

- 24*3600 calls : > 1 minute
- 24*3600 calls with only memory allocation and copying : < 1 second
- 1 call with a 24*3600 loop inside de C code : < 1 second

So it appears that the very coslty step is the transition from Pharo to C. And I was wondering if it was possible to drasticly reduce this time by doing something like, generate the the machine code once and call it multiple time ?

Thanks in advance !

Thomas.