How to correctly boost performances with Nativeboost ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

How to correctly boost performances with Nativeboost ?

Matthieu
Hello everyone,

I know, I come up with another question involving NativeBoost ... Sorry !

Anyway, I am currently trying to improve the performance of a Pharo application by trying to rewrite some parts in C and use them with NativeBoost.

In the application, there is a need to compute a lot of 2D projections of points so basically I need to compute a lot of cosinuses and sinuses on different points. Ultimately I'd like to do it with openMP to use all the processor cores but for now I just wanted to do some tests to find the best way to combine Pharo and C.

 The first test consisted in writing a very simple C function and use it from inside Pharo :

void project (Point *p) {
  p->x = cos(p->x);
  p->y = sin(p->y);
}

with
typedef struct {
  double x;
  double y;
} Point;

In Pharo I created a NBExternalStructure to match Point and everything worked as expected.

Now I ran some performance tests and there are some results I don't really understand :(

1) First test was :
test
| t1 t2 tC tPharo |
tC := 0. tPharo := 0.

1 to: 1000000 do: [ :i |
  | p1 |
  p1 := Point x: i + (1.0 / i ) y: i - (1.0 / i).
  t1 := Time microsecondsToRun: [
    p1 setX: p1 x cos setY: p1 y sin.
  ]
  tPharo := tPharo + t1.
]

1 to: 1000000 do: [ :i |
  | p2 |
  p2 := MyPoint x: i + (1.0 / i ) y: i - (1.0 / i).
  t2 := Time microsecondsToRun: [
    self primProject: p2.
  ]
  tC := tC + t2.
]

Results here are arround 185 ms for tC and 210 ms for tPharo.
Nothing really bothered me with this test as results seemed coherent (are they ?).

2) Second test was :
test2
| tC tPharo array1 array2 size |
size := 1000000.
array1 := Array new: size.
array2 := Array new: size.
1 to: size do: [ :i |
  array1 at: i put: (Point x: i + (1.0 / i ) y: i - (1.0 / i)).
  array2 at: i put: (MyPoint x: i + (1.0 / i ) y: i - (1.0 / i)).
].

tPharo := Time millisecondsToRun: [
  array1 do: [ :each |
    each setX: each x cos setY: each y sin.
  ].
].

tC := Time millisecondsToRun: [
  array2 do: [ :each |
    self primProject: each.
  ].
].

Results here are arround 3 500 ms for tPharo and 150ms for tC.
And I don't really understand why such a difference. I've tried doing tC first but it changed nothing.
I've checked the results and the contents of array2 are all updated correctly.
Is there a problem with tC or tPharo ? What am I doing wrong ?

On another hand, I also noticed that it is faster to allocate the array with Pharo Points than the array with MyPoint C-ish struct. I did some more testing and it seemed that on a general basis it is faster to manipulate Pharo objects than C-ish structs or pointers. For example, it is faster to read in a Pharo array than to read in a NativeBoost_to_C array.

Knowing so, what kind of C types should I use to avoid losing too much time manipulating them with NativeBoost ? Is it preferable, for example, to use two separate double instead of a struct with two doubles inside ?


Thank you very much,

Matthieu