Smalltalk › Squeak › Squeak - Dev

FFI is sloooow ;) [ Was: Re: [squeak-dev] Re: Alien vs. FFI benchmarks (Re: Trying to load ALienOpenGL into 4.1 alpha...) ]

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

3 messages Options

Igor Stasenko

FFI is sloooow ;) [ Was: Re: [squeak-dev] Re: Alien vs. FFI benchmarks (Re: Trying to load ALienOpenGL into 4.1 alpha...) ]

Okay, here's my results with ffi callout code generated on the fly:

NativeCodeTests new benchWinCall

#(752 288 205)

It benching a famous glGetError() function :)

benchWinCall
| time1 time2 time3 |

time1 := [ 1000000 timesRepeat: [ self ffiglError ] ] timeToRun.
time2 := [ 1000000 timesRepeat: [ self nbglError ] ] timeToRun.
time3 := [ 1000000 timesRepeat: [ ] ] timeToRun.

^ { time1. time2. time3 }

((752-205) /(288-205) ) asFloat
6.59 times faster!!

The original ffi method is following:

ffiglError
<apicall: long 'glGetError' () module: 'opengl32.dll'>
self primitiveFailed

And then, i replacing it with following:

nbglError
<primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>
"primitiveExternalCall"
self primitiveFailed

the native code, which is attaching to this method is written manually ;)
Here it is:

genWinCall
| addr asm fn |
self winCall.

" ( (NativeCodeTests methodDict at: #winCall) literalAt: 1) getHandle "
addr := ((self class methodDict at: #ffiglError) literalAt: 1) getHandle.
addr := addr asInteger.

asm := AJx86AsmBuilder x86.

fn := NBInterpreterProxyGen functions at: #signed32BitIntegerFor: .

asm
push: EBP;
mov: ESP->EBP;

"call external function"
mov: (asm imm: addr) -> EAX;
call: EAX;
push: EAX; "push return value"

"push function to call"
mov: (asm mem: EBP) + 12 to: EAX;
mov: (asm mem: EAX) + (fn index * 4) to: EAX;
push: EAX;

" call gate function"
mov: (asm mem: EBP) + 8 to: EAX;
call: EAX;

" clear the stack "
leave;
ret.

self install: asm bytes into: (self class methodDict at: #nbglError)

Of course, all this asm hackish stuff should be replaced by callout
autogenerator,
which should do everything by just taking a ffi pragma (<apicall: long
'glGetError' () module: 'opengl32.dll'> )
and nothing else :)

I will post additional benchmarks for functions which use some arguments later.

--
Best regards,
Igor Stasenko AKA sig.

Bert Freudenberg

Re: FFI is sloooow ; ) [ Was: Re: [squeak-dev] Re: Alien vs. FFI benchmarks (Re: Trying to load ALienOpenGL into 4.1 alpha...) ]

On 09.04.2010, at 10:44, Igor Stasenko wrote:

>
> Okay, here's my results with ffi callout code generated on the fly:
>
> NativeCodeTests new benchWinCall
>
> #(752 288 205)
>
> It benching a famous glGetError() function :)
>
> benchWinCall
> | time1 time2 time3 |
>
>
> time1 := [ 1000000 timesRepeat: [ self ffiglError ] ] timeToRun.
> time2 := [ 1000000 timesRepeat: [ self nbglError ] ] timeToRun.
> time3 := [ 1000000 timesRepeat: [ ] ] timeToRun.
>
> ^ { time1. time2. time3 }
>
> ((752-205) /(288-205) ) asFloat
> 6.59 times faster!!
>
> The original ffi method is following:
>
> ffiglError
> <apicall: long 'glGetError' () module: 'opengl32.dll'>
> self primitiveFailed
>
> And then, i replacing it with following:
>
> nbglError
> <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>
> "primitiveExternalCall"
> self primitiveFailed
>
> the native code, which is attaching to this method is written manually ;)
> Here it is:
>
> genWinCall
> | addr asm fn |
> self winCall.
>
> " ( (NativeCodeTests methodDict at: #winCall) literalAt: 1) getHandle "
> addr := ((self class methodDict at: #ffiglError) literalAt: 1) getHandle.
> addr := addr asInteger.
>
> asm := AJx86AsmBuilder x86.
>
> fn := NBInterpreterProxyGen functions at: #signed32BitIntegerFor: .
>
> asm
> push: EBP;
> mov: ESP->EBP;
>
> "call external function"
> mov: (asm imm: addr) -> EAX;
> call: EAX;
> push: EAX; "push return value"
>
> "push function to call"
> mov: (asm mem: EBP) + 12 to: EAX;
> mov: (asm mem: EAX) + (fn index * 4) to: EAX;
> push: EAX;
>
> " call gate function"
> mov: (asm mem: EBP) + 8 to: EAX;
> call: EAX;
>
> " clear the stack "
> leave;
> ret.
>
> self install: asm bytes into: (self class methodDict at: #nbglError)
>
> Of course, all this asm hackish stuff should be replaced by callout
> autogenerator,
> which should do everything by just taking a ffi pragma (<apicall: long
> 'glGetError' () module: 'opengl32.dll'> )
> and nothing else :)
>
> I will post additional benchmarks for functions which use some arguments later.
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>

Way cool :)

- Bert -

Igor Stasenko

Re: FFI is sloooow ; ) [ Was: Re: [squeak-dev] Re: Alien vs. FFI benchmarks (Re: Trying to load ALienOpenGL into 4.1 alpha...) ]

On 9 April 2010 11:49, Bert Freudenberg <[hidden email]> wrote:
>
> Way cool :)
>
:)
And here is most important piece of puzzle, and it wooorks!!!

testMovableStuff
"test that if native code calls a VM function which triggers
a full gc and relocates a native code, it will survive the move,

a native code should return a difference between old and new primitive method,
in case if its moved, the difference will be nonzero"

| asm fullGC primitiveMethod code |

asm := AJx86AsmBuilder x86.

primitiveMethod := NBInterpreterProxyGen functions at: #primitiveMethod.
fullGC := NBInterpreterProxyGen functions at: #fullGC .

asm
push: EBP;
mov: ESP->EBP;

mov: (asm mem: EBP) + 12 to: EAX;
mov: (asm mem: EAX) + (primitiveMethod index * 4) to: EAX;
call: EAX;
push: EAX;

"push function to call - fullGC"
mov: (asm mem: EBP) + 12 to: EAX;
mov: (asm mem: EAX) + (fullGC index * 4) to: EAX;
push: EAX;

" call gate function"
mov: (asm mem: EBP) + 8 to: EAX;
call: EAX;

mov: (asm mem: EBP) + 12 to: EAX;
mov: (asm mem: EAX) + (primitiveMethod index * 4) to: EAX;
call: EAX;

pop: EDX;
sub: EAX with: EDX;
shl: EAX with: 1;
inc: EAX;

" clear the stack "
leave;
ret.

code := asm bytes.
" we should not crash here ;) "
10 timesRepeat: [ Array new: 10.
self install: code into: (self class methodDict at: #movableStub).

self movableStub ]

------------

NativeCodeTests new testMovableStuff

#(-1581816 -2544 -3916 -2544 -2544 -2544 -2544 -2544 -2544 -2544)

this means, that while native code calls fullGC, it moves the code away,
but then a magic 'call gate' finds the way back to the right piece of code! :)

> - Bert -
>
>
>
>

--
Best regards,
Igor Stasenko AKA sig.