Smalltalk › Squeak › Squeak - Dev

FFI callout generator benchmarks

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

2 messages Options

Igor Stasenko

FFI callout generator benchmarks

I just made the initial FFI callout generator, which currently works
only with integer types.

The gap between FFI plugin and native code is much smaller:

NBFFICalloutTests new benchFFITest

#(3833 3064 310)

the first number is FFI plugin callout
the second number is native code FFI callout
the third number is calling an empty method with same number of arguments:

benchFFITest

"exclude initialization from benchmarks"
| time1 time2 time3 |
self ffiTestInt: 1 with: 2 with: 3 with: 4.
self nbTestInt: 1 with: 2 with: 3 with: 4.

time1 := [ 1000000 timesRepeat: [ self ffiTestInt: 1 with: 2 with: 3
with: 4 ] ] timeToRun.
time2 := [ 1000000 timesRepeat: [ self nbTestInt: 1 with: 2 with: 3
with: 4 ] ] timeToRun.
time3 := [ 1000000 timesRepeat: [ self noOp: 1 with: 2 with: 3 with:
4 ] ] timeToRun.

^ { time1. time2. time3 }

interesting, that if i replacing a smallint -> C int conversion
from
proxy signed32BitValueOf: EAX.
which expands to:
push eax
mov eax,[ebp][0C]
mov eax,[eax][00000170]
call eax
add esp,004

to just:
asm shr: EAX with: 1.

which does the same thing (except big ints conversion)
but saves about 16-50 msecs in a 1000000 loop!

Ohh, it looks like i picked bad function for benching:

/* test passing ints */
EXPORT(int) ffiTestInts(int c1, int c2, int c3, int c4) {
printf("4 ints came in as\ni1 = %d (%x)\ni2 = %d (%x)\ni3 = %d
(%x)\ni4 = %d (%x)\n", c1, c1, c2, c2, c3, c3, c4, c4);
return c1+c2;
}

a nasty printf levels all difference between the call types. So its
not very good function to bench a callout interface coercion speed :(

Any good guess , what function of some well-known external library i
could use? It should have at least 3 int arguments.

--
Best regards,
Igor Stasenko AKA sig.

Igor Stasenko

Re: FFI callout generator benchmarks

Okay, i found one (but with just two arguments)...

Its a windoze kernel's IsBadWritePtr() function:

The IsBadWritePtr function verifies that the calling process has write
access to the specified range of memory.

BOOL IsBadWritePtr(
LPVOID lp,
UINT_PTR ucb
);

NBFFICalloutTests new benchFFITest2

#(967 325 286 200)

here, as before
967 - ffi callout
325 - native code callout
286 - calling an empty method with same number arguments in a loop
200 - just a loop with empty block

so, a difference:
(967/325) asFloat 2.975384615384615
(967-200) / (325-200) asFloat 6.136
(967-286) / (325-286) asFloat 17.46153846153846

pick the one, which you like most ;)

Here is how the FFI code looks like:

ffiIsBadWritePtr: ptr size: blockSize
<apicall: long 'IsBadWritePtr' (long long) module: 'Kernel32.dll' >

self primitiveFailed

and here is how native code callout looks like:

nbIsBadWritePtr: ptr size: blockSize
<primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>

^ NBFFICallout apiCall: #( long 'IsBadWritePtr' (long long)) module:
'Kernel32.dll'

As you may see, i made it very similar to an original syntax, but need
to conform with smalltalk syntax.
Both methods calling the same function. The first is made through FFI plugin,
while second using some clever tricks:
- initially a method is just a regular method without a native code in trailer.
- a primitive detects that there is no native code to run, and fails
- next its going into NBFFICallout code which checking the reason of failure,
and if it sees that method doesn't having native code, it generating
it and changing the method's trailer and then calling this method
again.
If primitive fails for some other reason - it simply throws an error.

--
Best regards,
Igor Stasenko AKA sig.