f2c/g77 and problem with FFI 64bits returning float

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

f2c/g77 and problem with FFI 64bits returning float

Nicolas Cellier
 
Hi,
I have strange result with 64bits FFI function returning single precision float.
Here is an example:

(LapackSGEMatrix rows: #((2.3))) absMax.

This matrix has a single element, 2.3 rounded to single precision float
(2.299999952316284 when printed as a double precision)

absMax is supposed to take the maximum of absolute values in the matrix.
It does so thru Lapack function slange:
"
*  Purpose
*  =======
*  SLANGE  returns the value of the one norm,  or the Frobenius norm, or
*  the  infinity norm,  or the  element of  largest absolute value  of a
*  real matrix A.
"
    <cdecl: float 'slange_'( char * long * long * float * long * float * long )>

Unfortunately above snippet returns 3.6893488147419103e19

It correctly calls this:
            floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);

which translates into something like:

    0x10833c537 <+2615>: movq   -0x28(%rbp), %rax
    0x10833c53b <+2619>: movq   -0xe8(%rbp), %rcx
    0x10833c542 <+2626>: movq   0xd8(%rcx), %rdi
    0x10833c549 <+2633>: movq   -0xe8(%rbp), %rcx
    0x10833c550 <+2640>: movq   0xe0(%rcx), %rsi
    0x10833c557 <+2647>: movq   -0xe8(%rbp), %rcx
    0x10833c55e <+2654>: movq   0xe8(%rcx), %rdx
    0x10833c565 <+2661>: movq   -0xe8(%rbp), %rcx
    0x10833c56c <+2668>: movq   0xf0(%rcx), %rcx
    0x10833c573 <+2675>: movq   -0xe8(%rbp), %r8
    0x10833c57a <+2682>: movq   0xf8(%r8), %r8
    0x10833c581 <+2689>: movq   -0xe8(%rbp), %r9
    0x10833c588 <+2696>: movq   0x100(%r9), %r9
->  0x10833c58f <+2703>: callq  *%rax
    0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0
    0x10833c595 <+2709>: movsd  %xmm0, -0x150(%rbp)

If I print $xmm0 just after the callq, then
(lldb) nexti
(lldb) print $xmm0
(unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)

and just after the connversion to double precision:
(lldb) nexti
(lldb) print $xmm0
(unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)

Let's see:

tmp := #[16r00   16r00   16r00   16r60   16r66   16r66   16r02   16r40 ].
{tmp doubleAt: 1.
tmp floatAt: 1}.
 #(2.299999952316284 3.6893488147419103e19)

Bingo! that means that the value returned in xmm0 was already in double precision.
When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...

So why was slange result promoted to double?
I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.

Ah, Ah, f2c! Dont you promote float return values to double? YES
But why this does not happen with the 32bits VM ???
That's what drove me off the solution for a while...
It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).
So converting it to a double again like we do is a no-op and just works in 32bits.

That's going to be a problem for FORTRAN functions on 64bits.
IF compiled thru g77 or f2c conventions, then float results are promoted to double!
IF compiled thru gfortran, then float result just remain float results.
It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)

And how to adapt my FFI source code?
Last thing, f2c might also be non standard when returning a complex value
Big ball of mud...

Reply | Threaded
Open this post in threaded view
|

Re: f2c/g77 and problem with FFI 64bits returning float

Nicolas Cellier
 


2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>:
Hi,
I have strange result with 64bits FFI function returning single precision float.
Here is an example:

(LapackSGEMatrix rows: #((2.3))) absMax.

This matrix has a single element, 2.3 rounded to single precision float
(2.299999952316284 when printed as a double precision)

absMax is supposed to take the maximum of absolute values in the matrix.
It does so thru Lapack function slange:
"
*  Purpose
*  =======
*  SLANGE  returns the value of the one norm,  or the Frobenius norm, or
*  the  infinity norm,  or the  element of  largest absolute value  of a
*  real matrix A.
"
    <cdecl: float 'slange_'( char * long * long * float * long * float * long )>

Unfortunately above snippet returns 3.6893488147419103e19

It correctly calls this:
            floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);

which translates into something like:

    0x10833c537 <+2615>: movq   -0x28(%rbp), %rax
    0x10833c53b <+2619>: movq   -0xe8(%rbp), %rcx
    0x10833c542 <+2626>: movq   0xd8(%rcx), %rdi
    0x10833c549 <+2633>: movq   -0xe8(%rbp), %rcx
    0x10833c550 <+2640>: movq   0xe0(%rcx), %rsi
    0x10833c557 <+2647>: movq   -0xe8(%rbp), %rcx
    0x10833c55e <+2654>: movq   0xe8(%rcx), %rdx
    0x10833c565 <+2661>: movq   -0xe8(%rbp), %rcx
    0x10833c56c <+2668>: movq   0xf0(%rcx), %rcx
    0x10833c573 <+2675>: movq   -0xe8(%rbp), %r8
    0x10833c57a <+2682>: movq   0xf8(%r8), %r8
    0x10833c581 <+2689>: movq   -0xe8(%rbp), %r9
    0x10833c588 <+2696>: movq   0x100(%r9), %r9
->  0x10833c58f <+2703>: callq  *%rax
    0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0
    0x10833c595 <+2709>: movsd  %xmm0, -0x150(%rbp)

If I print $xmm0 just after the callq, then
(lldb) nexti
(lldb) print $xmm0
(unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)

and just after the connversion to double precision:
(lldb) nexti
(lldb) print $xmm0
(unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)

Let's see:

tmp := #[16r00   16r00   16r00   16r60   16r66   16r66   16r02   16r40 ].
{tmp doubleAt: 1.
tmp floatAt: 1}.
 #(2.299999952316284 3.6893488147419103e19)

Bingo! that means that the value returned in xmm0 was already in double precision.
When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...

So why was slange result promoted to double?
I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.

Ah, Ah, f2c! Dont you promote float return values to double? YES
But why this does not happen with the 32bits VM ???
That's what drove me off the solution for a while...
It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).
So converting it to a double again like we do is a no-op and just works in 32bits.

That's going to be a problem for FORTRAN functions on 64bits.
IF compiled thru g77 or f2c conventions, then float results are promoted to double!
IF compiled thru gfortran, then float result just remain float results.
It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)

And how to adapt my FFI source code?
Last thing, f2c might also be non standard when returning a complex value
Big ball of mud...

And bingo, ubuntu 14 version of lapack is compiled with gfortran.
So the slange snippet works perfectly on 64bits there...
Providing a Smallapack for all these cases is necessarily going to be boring.
I'm quite sure now that it is not a problem of our implementation of FFI, sorry for the noise.
It's a problem of FFI itself: too low level, not enough cross language compatibility, developpers choosing weird solutions (easy rather than simple). GRRR...
Reply | Threaded
Open this post in threaded view
|

Re: f2c/g77 and problem with FFI 64bits returning float

David T. Lewis
 
On Fri, Jan 27, 2017 at 02:39:54AM +0100, Nicolas Cellier wrote:

>  
> 2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>:
> >
> > And how to adapt my FFI source code?
> > Last thing, f2c might also be non standard when returning a complex value
> > Big ball of mud...
> >
> > And bingo, ubuntu 14 version of lapack is compiled with gfortran.
> So the slange snippet works perfectly on 64bits there...
> Providing a Smallapack for all these cases is necessarily going to be
> boring.
> I'm quite sure now that it is not a problem of our implementation of FFI,
> sorry for the noise.
> It's a problem of FFI itself: too low level, not enough cross language
> compatibility, developpers choosing weird solutions (easy rather than
> simple). GRRR...

Hi Nicolas,

I am interested to load Smallapack and browse the code, mainly to look
at the external interface and see if the FFI calls could easily be done
with a VM plugin.

Can you suggest what version of Squeak and VM is good to use? I do not
care if the image is old or new, or what VM to use, just as long as I can
load Smallapack.

I am assuming that I should follow these instructions from SqueakSource:

   1) load FFI
   
   2) load Smallapack/Compiler package to methods with allow more than 15 arguments
   
   3) load Smallapack/Collections,External,Algorithm,Matrix,Tests
   
   4) Maybe play again with class initializations

Thanks!
Dave

Reply | Threaded
Open this post in threaded view
|

Re: f2c/g77 and problem with FFI 64bits returning float

David T. Lewis
 
On Thu, Jan 26, 2017 at 09:54:57PM -0500, David T. Lewis wrote:

>  
> On Fri, Jan 27, 2017 at 02:39:54AM +0100, Nicolas Cellier wrote:
> >  
> > 2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>:
> > >
> > > And how to adapt my FFI source code?
> > > Last thing, f2c might also be non standard when returning a complex value
> > > Big ball of mud...
> > >
> > > And bingo, ubuntu 14 version of lapack is compiled with gfortran.
> > So the slange snippet works perfectly on 64bits there...
> > Providing a Smallapack for all these cases is necessarily going to be
> > boring.
> > I'm quite sure now that it is not a problem of our implementation of FFI,
> > sorry for the noise.
> > It's a problem of FFI itself: too low level, not enough cross language
> > compatibility, developpers choosing weird solutions (easy rather than
> > simple). GRRR...
>
> Hi Nicolas,
>
> I am interested to load Smallapack and browse the code, mainly to look
> at the external interface and see if the FFI calls could easily be done
> with a VM plugin.
>
> Can you suggest what version of Squeak and VM is good to use? I do not
> care if the image is old or new, or what VM to use, just as long as I can
> load Smallapack.
>
> I am assuming that I should follow these instructions from SqueakSource:
>
>    1) load FFI
>    
>    2) load Smallapack/Compiler package to methods with allow more than 15 arguments
>    
>    3) load Smallapack/Collections,External,Algorithm,Matrix,Tests
>    
>    4) Maybe play again with class initializations
>
> Thanks!
> Dave
>

D'oh!

Just noticed the ConfigurationOfSmallapack, which loads Smallapack perfectly
in the latest Squeak trunk.

Sorry for the noise,
Dave
 
Reply | Threaded
Open this post in threaded view
|

Re: f2c/g77 and problem with FFI 64bits returning float

Eliot Miranda-2
In reply to this post by Nicolas Cellier
 
Hi Nicolas,

    isn't the solution to simply misdeckare the fact notion as returning a double?  Provided the FFI mechanism supports all variants in use then it can collect values from all functions.  That the necessary declaration doesn't match the function' declaration when compiled with a non-conforming toolchain is unfortunate but can be lived with.

The issue is how to communicate the issue to programmers so they can diagnose and apply the work-around, which requires good documentation.

_,,,^..^,,,_ (phone)

> On Jan 26, 2017, at 4:57 PM, Nicolas Cellier <[hidden email]> wrote:
>
> Hi,
> I have strange result with 64bits FFI function returning single precision float.
> Here is an example:
>
> (LapackSGEMatrix rows: #((2.3))) absMax.
>
> This matrix has a single element, 2.3 rounded to single precision float
> (2.299999952316284 when printed as a double precision)
>
> absMax is supposed to take the maximum of absolute values in the matrix.
> It does so thru Lapack function slange:
> "
> *  Purpose
> *  =======
> *  SLANGE  returns the value of the one norm,  or the Frobenius norm, or
> *  the  infinity norm,  or the  element of  largest absolute value  of a
> *  real matrix A.
> "
>     <cdecl: float 'slange_'( char * long * long * float * long * float * long )>
>
> Unfortunately above snippet returns 3.6893488147419103e19
>
> It correctly calls this:
>             floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);
>
> which translates into something like:
>
>     0x10833c537 <+2615>: movq   -0x28(%rbp), %rax
>     0x10833c53b <+2619>: movq   -0xe8(%rbp), %rcx
>     0x10833c542 <+2626>: movq   0xd8(%rcx), %rdi
>     0x10833c549 <+2633>: movq   -0xe8(%rbp), %rcx
>     0x10833c550 <+2640>: movq   0xe0(%rcx), %rsi
>     0x10833c557 <+2647>: movq   -0xe8(%rbp), %rcx
>     0x10833c55e <+2654>: movq   0xe8(%rcx), %rdx
>     0x10833c565 <+2661>: movq   -0xe8(%rbp), %rcx
>     0x10833c56c <+2668>: movq   0xf0(%rcx), %rcx
>     0x10833c573 <+2675>: movq   -0xe8(%rbp), %r8
>     0x10833c57a <+2682>: movq   0xf8(%r8), %r8
>     0x10833c581 <+2689>: movq   -0xe8(%rbp), %r9
>     0x10833c588 <+2696>: movq   0x100(%r9), %r9
> ->  0x10833c58f <+2703>: callq  *%rax
>     0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0
>     0x10833c595 <+2709>: movsd  %xmm0, -0x150(%rbp)
>
> If I print $xmm0 just after the callq, then
> (lldb) nexti
> (lldb) print $xmm0
> (unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)
>
> and just after the connversion to double precision:
> (lldb) nexti
> (lldb) print $xmm0
> (unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)
>
> Let's see:
>
> tmp := #[16r00   16r00   16r00   16r60   16r66   16r66   16r02   16r40 ].
> {tmp doubleAt: 1.
> tmp floatAt: 1}.
>  #(2.299999952316284 3.6893488147419103e19)
>
> Bingo! that means that the value returned in xmm0 was already in double precision.
> When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...
>
> So why was slange result promoted to double?
> I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.
>
> Ah, Ah, f2c! Dont you promote float return values to double? YES
> But why this does not happen with the 32bits VM ???
> That's what drove me off the solution for a while...
> It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).
> So converting it to a double again like we do is a no-op and just works in 32bits.
>
> That's going to be a problem for FORTRAN functions on 64bits.
> IF compiled thru g77 or f2c conventions, then float results are promoted to double!
> IF compiled thru gfortran, then float result just remain float results.
> It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)
>
> And how to adapt my FFI source code?
> Last thing, f2c might also be non standard when returning a complex value
> Big ball of mud...
>
Reply | Threaded
Open this post in threaded view
|

Re: f2c/g77 and problem with FFI 64bits returning float

Nicolas Cellier
 
Hi Eliot,
yes, because f2c  translates FORTRAN to C, it necessarily follow some standard conventions.
(returning a double, and passing a complex result in first parameter by reference).
Then g77 just followed the f2c choices.
So the workaround was to:
- add a few FFI messages variant matching FFI conventions
- add a preference isCompiledWithF2Cconventions
- dispatch on this preference at upper level
- write a page on the wiki to inform about this mess, and tell how to recognize the convention in use
  (this might be automated in a more clever system, but it means playing with cached values and resetting at each startup)

It's just that I'll have to repeat this in the 2 other supported dialects,...

2017-01-30 7:17 GMT+01:00 Eliot Miranda <[hidden email]>:

Hi Nicolas,

    isn't the solution to simply misdeckare the fact notion as returning a double?  Provided the FFI mechanism supports all variants in use then it can collect values from all functions.  That the necessary declaration doesn't match the function' declaration when compiled with a non-conforming toolchain is unfortunate but can be lived with.

The issue is how to communicate the issue to programmers so they can diagnose and apply the work-around, which requires good documentation.

_,,,^..^,,,_ (phone)

> On Jan 26, 2017, at 4:57 PM, Nicolas Cellier <[hidden email]> wrote:
>
> Hi,
> I have strange result with 64bits FFI function returning single precision float.
> Here is an example:
>
> (LapackSGEMatrix rows: #((2.3))) absMax.
>
> This matrix has a single element, 2.3 rounded to single precision float
> (2.299999952316284 when printed as a double precision)
>
> absMax is supposed to take the maximum of absolute values in the matrix.
> It does so thru Lapack function slange:
> "
> *  Purpose
> *  =======
> *  SLANGE  returns the value of the one norm,  or the Frobenius norm, or
> *  the  infinity norm,  or the  element of  largest absolute value  of a
> *  real matrix A.
> "
>     <cdecl: float 'slange_'( char * long * long * float * long * float * long )>
>
> Unfortunately above snippet returns 3.6893488147419103e19
>
> It correctly calls this:
>             floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);
>
> which translates into something like:
>
>     0x10833c537 <+2615>: movq   -0x28(%rbp), %rax
>     0x10833c53b <+2619>: movq   -0xe8(%rbp), %rcx
>     0x10833c542 <+2626>: movq   0xd8(%rcx), %rdi
>     0x10833c549 <+2633>: movq   -0xe8(%rbp), %rcx
>     0x10833c550 <+2640>: movq   0xe0(%rcx), %rsi
>     0x10833c557 <+2647>: movq   -0xe8(%rbp), %rcx
>     0x10833c55e <+2654>: movq   0xe8(%rcx), %rdx
>     0x10833c565 <+2661>: movq   -0xe8(%rbp), %rcx
>     0x10833c56c <+2668>: movq   0xf0(%rcx), %rcx
>     0x10833c573 <+2675>: movq   -0xe8(%rbp), %r8
>     0x10833c57a <+2682>: movq   0xf8(%r8), %r8
>     0x10833c581 <+2689>: movq   -0xe8(%rbp), %r9
>     0x10833c588 <+2696>: movq   0x100(%r9), %r9
> ->  0x10833c58f <+2703>: callq  *%rax
>     0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0
>     0x10833c595 <+2709>: movsd  %xmm0, -0x150(%rbp)
>
> If I print $xmm0 just after the callq, then
> (lldb) nexti
> (lldb) print $xmm0
> (unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)
>
> and just after the connversion to double precision:
> (lldb) nexti
> (lldb) print $xmm0
> (unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)
>
> Let's see:
>
> tmp := #[16r00   16r00   16r00   16r60   16r66   16r66   16r02   16r40 ].
> {tmp doubleAt: 1.
> tmp floatAt: 1}.
>  #(2.299999952316284 3.6893488147419103e19)
>
> Bingo! that means that the value returned in xmm0 was already in double precision.
> When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...
>
> So why was slange result promoted to double?
> I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.
>
> Ah, Ah, f2c! Dont you promote float return values to double? YES
> But why this does not happen with the 32bits VM ???
> That's what drove me off the solution for a while...
> It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).
> So converting it to a double again like we do is a no-op and just works in 32bits.
>
> That's going to be a problem for FORTRAN functions on 64bits.
> IF compiled thru g77 or f2c conventions, then float results are promoted to double!
> IF compiled thru gfortran, then float result just remain float results.
> It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)
>
> And how to adapt my FFI source code?
> Last thing, f2c might also be non standard when returning a complex value
> Big ball of mud...
>