FFI callout generator benchmarks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

FFI callout generator benchmarks

Igor Stasenko
I just made the initial FFI callout generator, which currently works
only with integer types.

The gap between FFI plugin and native code is much smaller:

NBFFICalloutTests new benchFFITest

#(3833 3064 310)

the first number is FFI plugin callout
the second number is native code FFI callout
the third number is calling an empty method with same number of arguments:

benchFFITest

        "exclude initialization from benchmarks"
        | time1 time2 time3 |
        self ffiTestInt: 1 with: 2 with: 3 with: 4.
        self nbTestInt: 1 with: 2 with: 3 with: 4.

        time1 := [ 1000000 timesRepeat: [ self ffiTestInt: 1 with: 2 with: 3
with: 4 ] ] timeToRun.
        time2 := [ 1000000 timesRepeat: [ self nbTestInt: 1 with: 2 with: 3
with: 4 ] ] timeToRun.
        time3 := [ 1000000 timesRepeat: [ self noOp: 1 with: 2 with: 3 with:
4 ] ] timeToRun.
       
        ^ { time1. time2. time3 }


interesting, that if i replacing a smallint -> C int conversion
from
   proxy signed32BitValueOf: EAX.
which expands to:
 push        eax
 mov         eax,[ebp][0C]
 mov         eax,[eax][00000170]
 call        eax
 add         esp,004

to just:
   asm shr: EAX with: 1.

which does the same thing (except big ints conversion)
but saves about 16-50 msecs in a 1000000 loop!


Ohh, it looks like i picked bad function for benching:

/* test passing ints */
EXPORT(int) ffiTestInts(int c1, int c2, int c3, int c4) {
        printf("4 ints came in as\ni1 = %d (%x)\ni2 = %d (%x)\ni3 = %d
(%x)\ni4 = %d (%x)\n", c1, c1, c2, c2, c3, c3, c4, c4);
        return c1+c2;
}

a nasty printf levels all difference between the call types. So its
not very good function to bench a callout interface coercion speed :(

Any good guess , what function of some well-known external library i
could use? It should have at least 3 int arguments.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: FFI callout generator benchmarks

Igor Stasenko
Okay, i found one (but with just two arguments)...

Its a windoze kernel's IsBadWritePtr() function:

The IsBadWritePtr function verifies that the calling process has write
access to the specified range of memory.

BOOL IsBadWritePtr(
  LPVOID lp,
  UINT_PTR ucb
);

NBFFICalloutTests new benchFFITest2

#(967 325 286 200)

here, as before
 967 - ffi callout
 325 - native code callout
 286 - calling an empty method with same number arguments in a loop
 200 - just a loop with empty block

so, a difference:
(967/325) asFloat 2.975384615384615
(967-200) / (325-200) asFloat 6.136
(967-286) / (325-286) asFloat  17.46153846153846

pick the one, which you like most ;)

Here is how the FFI code looks like:

ffiIsBadWritePtr: ptr size: blockSize
        <apicall: long 'IsBadWritePtr' (long long) module: 'Kernel32.dll' >
       
        self primitiveFailed

and here is how native code callout looks like:

nbIsBadWritePtr: ptr size: blockSize
        <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>
       
        ^ NBFFICallout apiCall: #( long 'IsBadWritePtr' (long long)) module:
'Kernel32.dll'

As you may see, i made it very similar to an original syntax, but need
to conform with smalltalk syntax.
Both methods calling the same function. The first is made through FFI plugin,
while second using some clever tricks:
 - initially a method is just a regular method without a native code in trailer.
 - a primitive detects that there is no native code to run, and fails
 - next its going into NBFFICallout code which checking the reason of failure,
  and if it sees that method doesn't having native code, it generating
it and changing the method's trailer and then calling this method
again.
 If primitive fails for some other reason - it simply throws an error.

--
Best regards,
Igor Stasenko AKA sig.