Hi, I have strange result with 64bits FFI function returning single precision float.(LapackSGEMatrix rows: #((2.3))) absMax. (2.299999952316284 when printed as a double precision) " * Purpose * ======= * SLANGE returns the value of the one norm, or the Frobenius norm, or * the infinity norm, or the element of largest absolute value of a * real matrix A. " <cdecl: float 'slange_'( char * long * long * float * long * float * long )> floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]); 0x10833c537 <+2615>: movq -0x28(%rbp), %rax 0x10833c53b <+2619>: movq -0xe8(%rbp), %rcx 0x10833c542 <+2626>: movq 0xd8(%rcx), %rdi 0x10833c549 <+2633>: movq -0xe8(%rbp), %rcx 0x10833c550 <+2640>: movq 0xe0(%rcx), %rsi 0x10833c557 <+2647>: movq -0xe8(%rbp), %rcx 0x10833c55e <+2654>: movq 0xe8(%rcx), %rdx 0x10833c565 <+2661>: movq -0xe8(%rbp), %rcx 0x10833c56c <+2668>: movq 0xf0(%rcx), %rcx 0x10833c573 <+2675>: movq -0xe8(%rbp), %r8 0x10833c57a <+2682>: movq 0xf8(%r8), %r8 0x10833c581 <+2689>: movq -0xe8(%rbp), %r9 0x10833c588 <+2696>: movq 0x100(%r9), %r9 -> 0x10833c58f <+2703>: callq *%rax 0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0 0x10833c595 <+2709>: movsd %xmm0, -0x150(%rbp) (lldb) nexti (lldb) print $xmm0 (unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00) (lldb) nexti (lldb) print $xmm0 (unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00) Let's see: tmp := #[16r00 16r00 16r00 16r60 16r66 16r66 16r02 16r40 ]. {tmp doubleAt: 1. tmp floatAt: 1}. #(2.299999952316284 3.6893488147419103e19) Bingo! that means that the value returned in xmm0 was already in double precision. When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value... So why was slange result promoted to double? I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10. Ah, Ah, f2c! Dont you promote float return values to double? YES But why this does not happen with the 32bits VM ??? That's what drove me off the solution for a while... It's the IA32 ABI... return value is stored in ST0 (allways promoted to double). So converting it to a double again like we do is a no-op and just works in 32bits. That's going to be a problem for FORTRAN functions on 64bits. IF compiled thru g77 or f2c conventions, then float results are promoted to double! IF compiled thru gfortran, then float result just remain float results. It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...) And how to adapt my FFI source code? Last thing, f2c might also be non standard when returning a complex value Big ball of mud... |
2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>:
And bingo, ubuntu 14 version of lapack is compiled with gfortran. So the slange snippet works perfectly on 64bits there... Providing a Smallapack for all these cases is necessarily going to be boring. I'm quite sure now that it is not a problem of our implementation of FFI, sorry for the noise. It's a problem of FFI itself: too low level, not enough cross language compatibility, developpers choosing weird solutions (easy rather than simple). GRRR... |
On Fri, Jan 27, 2017 at 02:39:54AM +0100, Nicolas Cellier wrote: > > 2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>: > > > > And how to adapt my FFI source code? > > Last thing, f2c might also be non standard when returning a complex value > > Big ball of mud... > > > > And bingo, ubuntu 14 version of lapack is compiled with gfortran. > So the slange snippet works perfectly on 64bits there... > Providing a Smallapack for all these cases is necessarily going to be > boring. > I'm quite sure now that it is not a problem of our implementation of FFI, > sorry for the noise. > It's a problem of FFI itself: too low level, not enough cross language > compatibility, developpers choosing weird solutions (easy rather than > simple). GRRR... Hi Nicolas, I am interested to load Smallapack and browse the code, mainly to look at the external interface and see if the FFI calls could easily be done with a VM plugin. Can you suggest what version of Squeak and VM is good to use? I do not care if the image is old or new, or what VM to use, just as long as I can load Smallapack. I am assuming that I should follow these instructions from SqueakSource: 1) load FFI 2) load Smallapack/Compiler package to methods with allow more than 15 arguments 3) load Smallapack/Collections,External,Algorithm,Matrix,Tests 4) Maybe play again with class initializations Thanks! Dave |
On Thu, Jan 26, 2017 at 09:54:57PM -0500, David T. Lewis wrote: > > On Fri, Jan 27, 2017 at 02:39:54AM +0100, Nicolas Cellier wrote: > > > > 2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>: > > > > > > And how to adapt my FFI source code? > > > Last thing, f2c might also be non standard when returning a complex value > > > Big ball of mud... > > > > > > And bingo, ubuntu 14 version of lapack is compiled with gfortran. > > So the slange snippet works perfectly on 64bits there... > > Providing a Smallapack for all these cases is necessarily going to be > > boring. > > I'm quite sure now that it is not a problem of our implementation of FFI, > > sorry for the noise. > > It's a problem of FFI itself: too low level, not enough cross language > > compatibility, developpers choosing weird solutions (easy rather than > > simple). GRRR... > > Hi Nicolas, > > I am interested to load Smallapack and browse the code, mainly to look > at the external interface and see if the FFI calls could easily be done > with a VM plugin. > > Can you suggest what version of Squeak and VM is good to use? I do not > care if the image is old or new, or what VM to use, just as long as I can > load Smallapack. > > I am assuming that I should follow these instructions from SqueakSource: > > 1) load FFI > > 2) load Smallapack/Compiler package to methods with allow more than 15 arguments > > 3) load Smallapack/Collections,External,Algorithm,Matrix,Tests > > 4) Maybe play again with class initializations > > Thanks! > Dave > D'oh! Just noticed the ConfigurationOfSmallapack, which loads Smallapack perfectly in the latest Squeak trunk. Sorry for the noise, Dave |
In reply to this post by Nicolas Cellier
Hi Nicolas, isn't the solution to simply misdeckare the fact notion as returning a double? Provided the FFI mechanism supports all variants in use then it can collect values from all functions. That the necessary declaration doesn't match the function' declaration when compiled with a non-conforming toolchain is unfortunate but can be lived with. The issue is how to communicate the issue to programmers so they can diagnose and apply the work-around, which requires good documentation. _,,,^..^,,,_ (phone) > On Jan 26, 2017, at 4:57 PM, Nicolas Cellier <[hidden email]> wrote: > > Hi, > I have strange result with 64bits FFI function returning single precision float. > Here is an example: > > (LapackSGEMatrix rows: #((2.3))) absMax. > > This matrix has a single element, 2.3 rounded to single precision float > (2.299999952316284 when printed as a double precision) > > absMax is supposed to take the maximum of absolute values in the matrix. > It does so thru Lapack function slange: > " > * Purpose > * ======= > * SLANGE returns the value of the one norm, or the Frobenius norm, or > * the infinity norm, or the element of largest absolute value of a > * real matrix A. > " > <cdecl: float 'slange_'( char * long * long * float * long * float * long )> > > Unfortunately above snippet returns 3.6893488147419103e19 > > It correctly calls this: > floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]); > > which translates into something like: > > 0x10833c537 <+2615>: movq -0x28(%rbp), %rax > 0x10833c53b <+2619>: movq -0xe8(%rbp), %rcx > 0x10833c542 <+2626>: movq 0xd8(%rcx), %rdi > 0x10833c549 <+2633>: movq -0xe8(%rbp), %rcx > 0x10833c550 <+2640>: movq 0xe0(%rcx), %rsi > 0x10833c557 <+2647>: movq -0xe8(%rbp), %rcx > 0x10833c55e <+2654>: movq 0xe8(%rcx), %rdx > 0x10833c565 <+2661>: movq -0xe8(%rbp), %rcx > 0x10833c56c <+2668>: movq 0xf0(%rcx), %rcx > 0x10833c573 <+2675>: movq -0xe8(%rbp), %r8 > 0x10833c57a <+2682>: movq 0xf8(%r8), %r8 > 0x10833c581 <+2689>: movq -0xe8(%rbp), %r9 > 0x10833c588 <+2696>: movq 0x100(%r9), %r9 > -> 0x10833c58f <+2703>: callq *%rax > 0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0 > 0x10833c595 <+2709>: movsd %xmm0, -0x150(%rbp) > > If I print $xmm0 just after the callq, then > (lldb) nexti > (lldb) print $xmm0 > (unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00) > > and just after the connversion to double precision: > (lldb) nexti > (lldb) print $xmm0 > (unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00) > > Let's see: > > tmp := #[16r00 16r00 16r00 16r60 16r66 16r66 16r02 16r40 ]. > {tmp doubleAt: 1. > tmp floatAt: 1}. > #(2.299999952316284 3.6893488147419103e19) > > Bingo! that means that the value returned in xmm0 was already in double precision. > When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value... > > So why was slange result promoted to double? > I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10. > > Ah, Ah, f2c! Dont you promote float return values to double? YES > But why this does not happen with the 32bits VM ??? > That's what drove me off the solution for a while... > It's the IA32 ABI... return value is stored in ST0 (allways promoted to double). > So converting it to a double again like we do is a no-op and just works in 32bits. > > That's going to be a problem for FORTRAN functions on 64bits. > IF compiled thru g77 or f2c conventions, then float results are promoted to double! > IF compiled thru gfortran, then float result just remain float results. > It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...) > > And how to adapt my FFI source code? > Last thing, f2c might also be non standard when returning a complex value > Big ball of mud... > |
Hi Eliot, yes, because f2c translates FORTRAN to C, it necessarily follow some standard conventions.(returning a double, and passing a complex result in first parameter by reference). Then g77 just followed the f2c choices. So the workaround was to:- add a preference isCompiledWithF2Cconventions (this might be automated in a more clever system, but it means playing with cached values and resetting at each startup) It's just that I'll have to repeat this in the 2 other supported dialects,... 2017-01-30 7:17 GMT+01:00 Eliot Miranda <[hidden email]>:
|
Free forum by Nabble | Edit this page |