Login  Register

Re: NativeBoost : optimisation of the machine code generation

Posted by Thomas Bany on Aug 07, 2014; 5:20pm
URL: https://forum.world.st/NativeBoost-optimisation-of-the-machine-code-generation-tp4772327p4772365.html

I forgot the copying of the data from the object heap to C heap:

MyClass>>withNBCall
   externalArray := NBExternalArrayOfDoubles new: self internalArray size.
   output := NBExternalArrayOfDoubles new: 4.
   1 to: self internalArray size. do: [ :index | externalArray at: index (put: self internalArray at: index) ].
   [self actualNBCallWith: externalArray adress storeResultIn: output adress] ensure: [externalArray free. output free].

MyClass>>withNBCallCommented
   externalArray := NBExternalArrayOfDoubles new: self internalArray size.
   output := NBExternalArrayOfDoubles new: 4.
   1 to: self internalArray size. do: [ :index | externalArray at: index (put: self internalArray at: index) ].
   ["self actualNBCallWith: externalArray adress storeResultIn: output adress"] ensure: [externalArray free. output free].

Thomas.



2014-08-07 19:15 GMT+02:00 Thomas Bany <[hidden email]>:
@ Alexandre: sure, no problem !

@ Luc:

I'm not sure how much code I can provide without being to specific, but here is how it goes :

  • Let's say I have the Smalltalk code bellow:
MyClass>>withNBCall
   externalArray := NBExternalArrayOfDoubles new: self internalArray size.
   output := NBExternalArrayOfDoubles new: 4.
   [self actualNBCallWith: externalArray adress storeResultIn: output adress] ensure: [externalArray free. output free].

MyClass>>withNBCallCommented
   externalArray := NBExternalArrayOfDoubles new: self internalArray size.
   output := NBExternalArrayOfDoubles new: 4.
   ["self actualNBCallWith: externalArray adress storeResultIn: output adress"] ensure: [externalArray free. output free].

MyClass>>actualNBCallWith: externalArray storeResultIn: output
   <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin' error: errorCode>
   ^self nbCall: #(void callToC(double * externalArray, double * output)) module: 'lib/myModule.dll'

  • And the C code bellow:
void callToC(double * externalArray, double * output) {
   computationWith(externalArray, output);
}

void specialCallToC(double * externalArray, double * output) {
   unsigned int i;
   for (i = 0; i < 24*3600; i++)
      computationWith(externalArray, output);
}

  • Now I have the following code typed in Time Profiler tool :
object := (MyClass new) variousInitialization; yourself
24*3600 timesRepeat: [object withNBCall]
>> Over 1 minute computation time of which over 99% are primitives. Also I don't see the nbCall: in the tree.

object := (MyClass new) variousInitialization; yourself
24*3600 timesRepeat: [object withNBCallCommented]
>> Less than 1 second.

object := (MyClass new) variousInitialization; yourself
object withNBCall
>> Less than 1 millisecond.

object := (MyClass new) variousInitialization; yourself
object withNBSpecialCall "This time, I use the specialCallToC() function"
>> Arround 20 millisecond.


Allright, that's a pile of code but I hope it help :)

On a side note:
  • Pharo 3, Win 7 32-bit
  • I'm not at work anymore and don't have my code with me. So I will double check tomorow that I didn't provided false informations but I think it's accurate of what I do.

Again, thanks for the interest on my issue !

Thomas.



2014-08-07 18:39 GMT+02:00 kilon alios <[hidden email]>:

I think that if you posted the code , preferably that contains only the problem would be easier to test , debug and investigate. 



On Thu, Aug 7, 2014 at 6:25 PM, Thomas Bany <[hidden email]> wrote:
Hi everyone,

I'm trying to reduce the computation time of the following pseudo-code:

- memory allocation (~40 doubles)
- object heap to C heap copying
- NativeBoost call (nbCall:)
- memory freeing

The time profiling results are bellow:

- 24*3600 calls : > 1 minute
- 24*3600 calls with only memory allocation and copying : < 1 second
- 1 call with a 24*3600 loop inside de C code : < 1 second

So it appears that the very coslty step is the transition from Pharo to C. And I was wondering if it was possible to drasticly reduce this time by doing something like, generate the the machine code once and call it multiple time ?

Thanks in advance !

Thomas.