FFI is sloooow ;) [ Was: Re: [squeak-dev] Re: Alien vs. FFI benchmarks (Re: Trying to load ALienOpenGL into 4.1 alpha...) ]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

FFI is sloooow ;) [ Was: Re: [squeak-dev] Re: Alien vs. FFI benchmarks (Re: Trying to load ALienOpenGL into 4.1 alpha...) ]

Igor Stasenko
Okay, here's my results with ffi callout code generated on the fly:

NativeCodeTests new benchWinCall

#(752 288 205)

It benching a famous glGetError() function :)

benchWinCall
        | time1 time2 time3 |
       
       
        time1 := [ 1000000 timesRepeat: [ self ffiglError ] ] timeToRun.
        time2 := [ 1000000 timesRepeat: [ self nbglError ] ] timeToRun.
        time3 := [ 1000000 timesRepeat: [  ] ] timeToRun.
       
        ^ { time1. time2. time3 }

((752-205) /(288-205) ) asFloat
6.59  times faster!!

The original ffi method is following:

ffiglError
        <apicall: long 'glGetError' () module: 'opengl32.dll'>
        self primitiveFailed

And then, i replacing it with following:

nbglError
        <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>
"primitiveExternalCall"
        self primitiveFailed

the native code, which is attaching to this method is written manually ;)
Here it is:

genWinCall
        | addr asm fn |
        self winCall.
       
        " ( (NativeCodeTests methodDict at: #winCall) literalAt: 1) getHandle "
        addr := ((self class methodDict at: #ffiglError) literalAt: 1) getHandle.
        addr := addr asInteger.
       
        asm := AJx86AsmBuilder x86.

        fn := NBInterpreterProxyGen functions at: #signed32BitIntegerFor: .
       
        asm
                push: EBP;
                mov:  ESP->EBP;
               
                "call external function"
                mov: (asm imm: addr) -> EAX;
                call: EAX;
                push: EAX; "push return value"

                "push function to call"
                mov: (asm mem: EBP) + 12 to: EAX;
                mov: (asm mem: EAX) + (fn index * 4) to: EAX;
                push: EAX;
               
                " call gate function"
                mov: (asm mem: EBP) + 8 to: EAX;
                call: EAX;

                " clear the stack "
                leave;
                ret.
               
        self install: asm bytes into: (self class methodDict at: #nbglError)
       
Of course, all this asm hackish stuff should be replaced by callout
autogenerator,
which should do everything by just taking a ffi pragma (<apicall: long
'glGetError' () module: 'opengl32.dll'> )
and nothing else :)

I will post additional benchmarks for functions which use some arguments later.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: FFI is sloooow ; ) [ Was: Re: [squeak-dev] Re: Alien vs. FFI benchmarks (Re: Trying to load ALienOpenGL into 4.1 alpha...) ]

Bert Freudenberg
On 09.04.2010, at 10:44, Igor Stasenko wrote:

>
> Okay, here's my results with ffi callout code generated on the fly:
>
> NativeCodeTests new benchWinCall
>
> #(752 288 205)
>
> It benching a famous glGetError() function :)
>
> benchWinCall
> | time1 time2 time3 |
>
>
> time1 := [ 1000000 timesRepeat: [ self ffiglError ] ] timeToRun.
> time2 := [ 1000000 timesRepeat: [ self nbglError ] ] timeToRun.
> time3 := [ 1000000 timesRepeat: [  ] ] timeToRun.
>
> ^ { time1. time2. time3 }
>
> ((752-205) /(288-205) ) asFloat
> 6.59  times faster!!
>
> The original ffi method is following:
>
> ffiglError
> <apicall: long 'glGetError' () module: 'opengl32.dll'>
> self primitiveFailed
>
> And then, i replacing it with following:
>
> nbglError
> <primitive: 'primitiveNativeCall' module: 'NativeBoostPlugin'>
> "primitiveExternalCall"
> self primitiveFailed
>
> the native code, which is attaching to this method is written manually ;)
> Here it is:
>
> genWinCall
> | addr asm fn |
> self winCall.
>
> " ( (NativeCodeTests methodDict at: #winCall) literalAt: 1) getHandle "
> addr := ((self class methodDict at: #ffiglError) literalAt: 1) getHandle.
> addr := addr asInteger.
>
> asm := AJx86AsmBuilder x86.
>
> fn := NBInterpreterProxyGen functions at: #signed32BitIntegerFor: .
>
> asm
> push: EBP;
> mov:  ESP->EBP;
>
> "call external function"
> mov: (asm imm: addr) -> EAX;
> call: EAX;
> push: EAX; "push return value"
>
> "push function to call"
> mov: (asm mem: EBP) + 12 to: EAX;
> mov: (asm mem: EAX) + (fn index * 4) to: EAX;
> push: EAX;
>
> " call gate function"
> mov: (asm mem: EBP) + 8 to: EAX;
> call: EAX;
>
> " clear the stack "
> leave;
> ret.
>
> self install: asm bytes into: (self class methodDict at: #nbglError)
>
> Of course, all this asm hackish stuff should be replaced by callout
> autogenerator,
> which should do everything by just taking a ffi pragma (<apicall: long
> 'glGetError' () module: 'opengl32.dll'> )
> and nothing else :)
>
> I will post additional benchmarks for functions which use some arguments later.
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>

Way cool :)

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: FFI is sloooow ; ) [ Was: Re: [squeak-dev] Re: Alien vs. FFI benchmarks (Re: Trying to load ALienOpenGL into 4.1 alpha...) ]

Igor Stasenko
On 9 April 2010 11:49, Bert Freudenberg <[hidden email]> wrote:
>
> Way cool :)
>
:)
And here is most important piece of puzzle, and it wooorks!!!

testMovableStuff
        "test that if native code calls a VM function which triggers
        a full gc and relocates a native code, it will survive the move,
       
        a native code should return a difference between old and new primitive method,
        in case if its moved, the difference will be nonzero"
       
        | asm fullGC primitiveMethod code |
       
        asm := AJx86AsmBuilder x86.

        primitiveMethod := NBInterpreterProxyGen functions at: #primitiveMethod.
        fullGC := NBInterpreterProxyGen functions at: #fullGC .
       
        asm
                push: EBP;
                mov:  ESP->EBP;

                mov: (asm mem: EBP) + 12 to: EAX;
                mov: (asm mem: EAX) + (primitiveMethod index * 4) to: EAX;
                call: EAX;
                push: EAX;
               
                "push function to call - fullGC"
                mov: (asm mem: EBP) + 12 to: EAX;
                mov: (asm mem: EAX) + (fullGC index * 4) to: EAX;
                push: EAX;
               
                " call gate function"
                mov: (asm mem: EBP) + 8 to: EAX;
                call: EAX;

                mov: (asm mem: EBP) + 12 to: EAX;
                mov: (asm mem: EAX) + (primitiveMethod index * 4) to: EAX;
                call: EAX;

                pop: EDX;
                sub: EAX with: EDX;
                shl: EAX with: 1;
                inc: EAX;
               
                " clear the stack "
                leave;
                ret.

        code := asm bytes.
        " we should not crash here ;) "
        10 timesRepeat:  [ Array new: 10.
                self install: code into: (self class methodDict at: #movableStub).
               
                self movableStub  ]
       
------------
       
NativeCodeTests new testMovableStuff

 #(-1581816 -2544 -3916 -2544 -2544 -2544 -2544 -2544 -2544 -2544)


this means, that while native code calls fullGC, it moves the code away,
but then a magic 'call gate' finds the way back to the right piece of code! :)

> - Bert -
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.