Smalltalk › Squeak › Squeak VM

f2c/g77 and problem with FFI 64bits returning float

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

Nicolas Cellier

f2c/g77 and problem with FFI 64bits returning float

Hi,

I have strange result with 64bits FFI function returning single precision float.

Here is an example:

(LapackSGEMatrix rows: #((2.3))) absMax.

This matrix has a single element, 2.3 rounded to single precision float
(2.299999952316284 when printed as a double precision)

absMax is supposed to take the maximum of absolute values in the matrix.

It does so thru Lapack function slange:
"
* Purpose
* =======
* SLANGE returns the value of the one norm, or the Frobenius norm, or
* the infinity norm, or the element of largest absolute value of a
* real matrix A.
"
<cdecl: float 'slange_'( char * long * long * float * long * float * long )>

Unfortunately above snippet returns 3.6893488147419103e19

It correctly calls this:
floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);

which translates into something like:

    0x10833c537 <+2615>: movq   -0x28(%rbp), %rax
    0x10833c53b <+2619>: movq   -0xe8(%rbp), %rcx
    0x10833c542 <+2626>: movq   0xd8(%rcx), %rdi
    0x10833c549 <+2633>: movq   -0xe8(%rbp), %rcx
    0x10833c550 <+2640>: movq   0xe0(%rcx), %rsi
    0x10833c557 <+2647>: movq   -0xe8(%rbp), %rcx
    0x10833c55e <+2654>: movq   0xe8(%rcx), %rdx
    0x10833c565 <+2661>: movq   -0xe8(%rbp), %rcx
    0x10833c56c <+2668>: movq   0xf0(%rcx), %rcx
    0x10833c573 <+2675>: movq   -0xe8(%rbp), %r8
    0x10833c57a <+2682>: movq   0xf8(%r8), %r8
    0x10833c581 <+2689>: movq   -0xe8(%rbp), %r9
    0x10833c588 <+2696>: movq   0x100(%r9), %r9
-> 0x10833c58f <+2703>: callq *%rax
    0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0
    0x10833c595 <+2709>: movsd %xmm0, -0x150(%rbp)

If I print $xmm0 just after the callq, then
(lldb) nexti
(lldb) print $xmm0
(unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)

and just after the connversion to double precision:
(lldb) nexti
(lldb) print $xmm0
(unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)

Let's see:

tmp := #[16r00 16r00 16r00 16r60 16r66 16r66 16r02 16r40 ].
{tmp doubleAt: 1.
tmp floatAt: 1}.
#(2.299999952316284 3.6893488147419103e19)

Bingo! that means that the value returned in xmm0 was already in double precision.

When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...

So why was slange result promoted to double?

I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.

Ah, Ah, f2c! Dont you promote float return values to double? YES
But why this does not happen with the 32bits VM ???

That's what drove me off the solution for a while...

It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).

So converting it to a double again like we do is a no-op and just works in 32bits.

That's going to be a problem for FORTRAN functions on 64bits.

IF compiled thru g77 or f2c conventions, then float results are promoted to double!

IF compiled thru gfortran, then float result just remain float results.

It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)

And how to adapt my FFI source code?

Last thing, f2c might also be non standard when returning a complex value

Big ball of mud...

Nicolas Cellier

Re: f2c/g77 and problem with FFI 64bits returning float

2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>:

Hi,
I have strange result with 64bits FFI function returning single precision float.
Here is an example:

(LapackSGEMatrix rows: #((2.3))) absMax.

This matrix has a single element, 2.3 rounded to single precision float
(2.299999952316284 when printed as a double precision)

absMax is supposed to take the maximum of absolute values in the matrix.
It does so thru Lapack function slange:
"
* Purpose
* =======
* SLANGE returns the value of the one norm, or the Frobenius norm, or
* the infinity norm, or the element of largest absolute value of a
* real matrix A.
"
    <cdecl: float 'slange_'( char * long * long * float * long * float * long )>

Unfortunately above snippet returns 3.6893488147419103e19

It correctly calls this:
            floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);

which translates into something like:

    0x10833c537 <+2615>: movq   -0x28(%rbp), %rax
    0x10833c53b <+2619>: movq   -0xe8(%rbp), %rcx
    0x10833c542 <+2626>: movq   0xd8(%rcx), %rdi
    0x10833c549 <+2633>: movq   -0xe8(%rbp), %rcx
    0x10833c550 <+2640>: movq   0xe0(%rcx), %rsi
    0x10833c557 <+2647>: movq   -0xe8(%rbp), %rcx
    0x10833c55e <+2654>: movq   0xe8(%rcx), %rdx
    0x10833c565 <+2661>: movq   -0xe8(%rbp), %rcx
    0x10833c56c <+2668>: movq   0xf0(%rcx), %rcx
    0x10833c573 <+2675>: movq   -0xe8(%rbp), %r8
    0x10833c57a <+2682>: movq   0xf8(%r8), %r8
    0x10833c581 <+2689>: movq   -0xe8(%rbp), %r9
    0x10833c588 <+2696>: movq   0x100(%r9), %r9
-> 0x10833c58f <+2703>: callq *%rax
    0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0
    0x10833c595 <+2709>: movsd %xmm0, -0x150(%rbp)

If I print $xmm0 just after the callq, then
(lldb) nexti
(lldb) print $xmm0
(unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)

and just after the connversion to double precision:
(lldb) nexti
(lldb) print $xmm0
(unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)

Let's see:

tmp := #[16r00   16r00   16r00   16r60   16r66   16r66   16r02   16r40 ].
{tmp doubleAt: 1.
tmp floatAt: 1}.
#(2.299999952316284 3.6893488147419103e19)

Bingo! that means that the value returned in xmm0 was already in double precision.
When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...

So why was slange result promoted to double?
I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.

Ah, Ah, f2c! Dont you promote float return values to double? YES
But why this does not happen with the 32bits VM ???
That's what drove me off the solution for a while...
It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).
So converting it to a double again like we do is a no-op and just works in 32bits.

That's going to be a problem for FORTRAN functions on 64bits.
IF compiled thru g77 or f2c conventions, then float results are promoted to double!
IF compiled thru gfortran, then float result just remain float results.
It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)

And how to adapt my FFI source code?
Last thing, f2c might also be non standard when returning a complex value
Big ball of mud...

And bingo, ubuntu 14 version of lapack is compiled with gfortran.

So the slange snippet works perfectly on 64bits there...

Providing a Smallapack for all these cases is necessarily going to be boring.

I'm quite sure now that it is not a problem of our implementation of FFI, sorry for the noise.

It's a problem of FFI itself: too low level, not enough cross language compatibility, developpers choosing weird solutions (easy rather than simple). GRRR...

David T. Lewis

Re: f2c/g77 and problem with FFI 64bits returning float

On Fri, Jan 27, 2017 at 02:39:54AM +0100, Nicolas Cellier wrote:

>
> 2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>:
> >
> > And how to adapt my FFI source code?
> > Last thing, f2c might also be non standard when returning a complex value
> > Big ball of mud...
> >
> > And bingo, ubuntu 14 version of lapack is compiled with gfortran.
> So the slange snippet works perfectly on 64bits there...
> Providing a Smallapack for all these cases is necessarily going to be
> boring.
> I'm quite sure now that it is not a problem of our implementation of FFI,
> sorry for the noise.
> It's a problem of FFI itself: too low level, not enough cross language
> compatibility, developpers choosing weird solutions (easy rather than
> simple). GRRR...

Hi Nicolas,

I am interested to load Smallapack and browse the code, mainly to look
at the external interface and see if the FFI calls could easily be done
with a VM plugin.

Can you suggest what version of Squeak and VM is good to use? I do not
care if the image is old or new, or what VM to use, just as long as I can
load Smallapack.

I am assuming that I should follow these instructions from SqueakSource:

1) load FFI

2) load Smallapack/Compiler package to methods with allow more than 15 arguments

3) load Smallapack/Collections,External,Algorithm,Matrix,Tests

4) Maybe play again with class initializations

Thanks!
Dave

David T. Lewis

Re: f2c/g77 and problem with FFI 64bits returning float

On Thu, Jan 26, 2017 at 09:54:57PM -0500, David T. Lewis wrote:

>
> On Fri, Jan 27, 2017 at 02:39:54AM +0100, Nicolas Cellier wrote:
> >
> > 2017-01-27 1:57 GMT+01:00 Nicolas Cellier <[hidden email]>:
> > >
> > > And how to adapt my FFI source code?
> > > Last thing, f2c might also be non standard when returning a complex value
> > > Big ball of mud...
> > >
> > > And bingo, ubuntu 14 version of lapack is compiled with gfortran.
> > So the slange snippet works perfectly on 64bits there...
> > Providing a Smallapack for all these cases is necessarily going to be
> > boring.
> > I'm quite sure now that it is not a problem of our implementation of FFI,
> > sorry for the noise.
> > It's a problem of FFI itself: too low level, not enough cross language
> > compatibility, developpers choosing weird solutions (easy rather than
> > simple). GRRR...
>
> Hi Nicolas,
>
> I am interested to load Smallapack and browse the code, mainly to look
> at the external interface and see if the FFI calls could easily be done
> with a VM plugin.
>
> Can you suggest what version of Squeak and VM is good to use? I do not
> care if the image is old or new, or what VM to use, just as long as I can
> load Smallapack.
>
> I am assuming that I should follow these instructions from SqueakSource:
>
> 1) load FFI
>
> 2) load Smallapack/Compiler package to methods with allow more than 15 arguments
>
> 3) load Smallapack/Collections,External,Algorithm,Matrix,Tests
>
> 4) Maybe play again with class initializations
>
> Thanks!
> Dave
>

D'oh!

Just noticed the ConfigurationOfSmallapack, which loads Smallapack perfectly
in the latest Squeak trunk.

Sorry for the noise,
Dave

Eliot Miranda-2

Re: f2c/g77 and problem with FFI 64bits returning float

In reply to this post by Nicolas Cellier

Hi Nicolas,

isn't the solution to simply misdeckare the fact notion as returning a double? Provided the FFI mechanism supports all variants in use then it can collect values from all functions. That the necessary declaration doesn't match the function' declaration when compiled with a non-conforming toolchain is unfortunate but can be lived with.

The issue is how to communicate the issue to programmers so they can diagnose and apply the work-around, which requires good documentation.

_,,,^..^,,,_ (phone)

> On Jan 26, 2017, at 4:57 PM, Nicolas Cellier <[hidden email]> wrote:
>
> Hi,
> I have strange result with 64bits FFI function returning single precision float.
> Here is an example:
>
> (LapackSGEMatrix rows: #((2.3))) absMax.
>
> This matrix has a single element, 2.3 rounded to single precision float
> (2.299999952316284 when printed as a double precision)
>
> absMax is supposed to take the maximum of absolute values in the matrix.
> It does so thru Lapack function slange:
> "
> * Purpose
> * =======
> * SLANGE returns the value of the one norm, or the Frobenius norm, or
> * the infinity norm, or the element of largest absolute value of a
> * real matrix A.
> "
> <cdecl: float 'slange_'( char * long * long * float * long * float * long )>
>
> Unfortunately above snippet returns 3.6893488147419103e19
>
> It correctly calls this:
> floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);
>
> which translates into something like:
>
> 0x10833c537 <+2615>: movq -0x28(%rbp), %rax
> 0x10833c53b <+2619>: movq -0xe8(%rbp), %rcx
> 0x10833c542 <+2626>: movq 0xd8(%rcx), %rdi
> 0x10833c549 <+2633>: movq -0xe8(%rbp), %rcx
> 0x10833c550 <+2640>: movq 0xe0(%rcx), %rsi
> 0x10833c557 <+2647>: movq -0xe8(%rbp), %rcx
> 0x10833c55e <+2654>: movq 0xe8(%rcx), %rdx
> 0x10833c565 <+2661>: movq -0xe8(%rbp), %rcx
> 0x10833c56c <+2668>: movq 0xf0(%rcx), %rcx
> 0x10833c573 <+2675>: movq -0xe8(%rbp), %r8
> 0x10833c57a <+2682>: movq 0xf8(%r8), %r8
> 0x10833c581 <+2689>: movq -0xe8(%rbp), %r9
> 0x10833c588 <+2696>: movq 0x100(%r9), %r9
> -> 0x10833c58f <+2703>: callq *%rax
> 0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0
> 0x10833c595 <+2709>: movsd %xmm0, -0x150(%rbp)
>
> If I print $xmm0 just after the callq, then
> (lldb) nexti
> (lldb) print $xmm0
> (unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)
>
> and just after the connversion to double precision:
> (lldb) nexti
> (lldb) print $xmm0
> (unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)
>
> Let's see:
>
> tmp := #[16r00 16r00 16r00 16r60 16r66 16r66 16r02 16r40 ].
> {tmp doubleAt: 1.
> tmp floatAt: 1}.
> #(2.299999952316284 3.6893488147419103e19)
>
> Bingo! that means that the value returned in xmm0 was already in double precision.
> When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...
>
> So why was slange result promoted to double?
> I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.
>
> Ah, Ah, f2c! Dont you promote float return values to double? YES
> But why this does not happen with the 32bits VM ???
> That's what drove me off the solution for a while...
> It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).
> So converting it to a double again like we do is a no-op and just works in 32bits.
>
> That's going to be a problem for FORTRAN functions on 64bits.
> IF compiled thru g77 or f2c conventions, then float results are promoted to double!
> IF compiled thru gfortran, then float result just remain float results.
> It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)
>
> And how to adapt my FFI source code?
> Last thing, f2c might also be non standard when returning a complex value
> Big ball of mud...
>

Nicolas Cellier

Re: f2c/g77 and problem with FFI 64bits returning float

Hi Eliot,

yes, because f2c translates FORTRAN to C, it necessarily follow some standard conventions.
(returning a double, and passing a complex result in first parameter by reference).

Then g77 just followed the f2c choices.

So the workaround was to:

- add a few FFI messages variant matching FFI conventions
- add a preference isCompiledWithF2Cconventions

- dispatch on this preference at upper level

- write a page on the wiki to inform about this mess, and tell how to recognize the convention in use
(this might be automated in a more clever system, but it means playing with cached values and resetting at each startup)

It's just that I'll have to repeat this in the 2 other supported dialects,...

2017-01-30 7:17 GMT+01:00 Eliot Miranda <[hidden email]>:

Hi Nicolas,

isn't the solution to simply misdeckare the fact notion as returning a double? Provided the FFI mechanism supports all variants in use then it can collect values from all functions. That the necessary declaration doesn't match the function' declaration when compiled with a non-conforming toolchain is unfortunate but can be lived with.

The issue is how to communicate the issue to programmers so they can diagnose and apply the work-around, which requires good documentation.

_,,,^..^,,,_ (phone)

> On Jan 26, 2017, at 4:57 PM, Nicolas Cellier <[hidden email]> wrote:
>
> Hi,
> I have strange result with 64bits FFI function returning single precision float.
> Here is an example:
>
> (LapackSGEMatrix rows: #((2.3))) absMax.
>
> This matrix has a single element, 2.3 rounded to single precision float
> (2.299999952316284 when printed as a double precision)
>
> absMax is supposed to take the maximum of absolute values in the matrix.
> It does so thru Lapack function slange:
> "
> * Purpose
> * =======
> * SLANGE returns the value of the one norm, or the Frobenius norm, or
> * the infinity norm, or the element of largest absolute value of a
> * real matrix A.
> "
> <cdecl: float 'slange_'( char * long * long * float * long * float * long )>
>
> Unfortunately above snippet returns 3.6893488147419103e19
>
> It correctly calls this:
> floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);
>
> which translates into something like:
>
> 0x10833c537 <+2615>: movq -0x28(%rbp), %rax
> 0x10833c53b <+2619>: movq -0xe8(%rbp), %rcx
> 0x10833c542 <+2626>: movq 0xd8(%rcx), %rdi
> 0x10833c549 <+2633>: movq -0xe8(%rbp), %rcx
> 0x10833c550 <+2640>: movq 0xe0(%rcx), %rsi
> 0x10833c557 <+2647>: movq -0xe8(%rbp), %rcx
> 0x10833c55e <+2654>: movq 0xe8(%rcx), %rdx
> 0x10833c565 <+2661>: movq -0xe8(%rbp), %rcx
> 0x10833c56c <+2668>: movq 0xf0(%rcx), %rcx
> 0x10833c573 <+2675>: movq -0xe8(%rbp), %r8
> 0x10833c57a <+2682>: movq 0xf8(%r8), %r8
> 0x10833c581 <+2689>: movq -0xe8(%rbp), %r9
> 0x10833c588 <+2696>: movq 0x100(%r9), %r9
> -> 0x10833c58f <+2703>: callq *%rax
> 0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0
> 0x10833c595 <+2709>: movsd %xmm0, -0x150(%rbp)
>
> If I print $xmm0 just after the callq, then
> (lldb) nexti
> (lldb) print $xmm0
> (unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)
>
> and just after the connversion to double precision:
> (lldb) nexti
> (lldb) print $xmm0
> (unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)
>
> Let's see:
>
> tmp := #[16r00 16r00 16r00 16r60 16r66 16r66 16r02 16r40 ].
> {tmp doubleAt: 1.
> tmp floatAt: 1}.
> #(2.299999952316284 3.6893488147419103e19)
>
> Bingo! that means that the value returned in xmm0 was already in double precision.
> When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...
>
> So why was slange result promoted to double?
> I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.
>
> Ah, Ah, f2c! Dont you promote float return values to double? YES
> But why this does not happen with the 32bits VM ???
> That's what drove me off the solution for a while...
> It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).
> So converting it to a double again like we do is a no-op and just works in 32bits.
>
> That's going to be a problem for FORTRAN functions on 64bits.
> IF compiled thru g77 or f2c conventions, then float results are promoted to double!
> IF compiled thru gfortran, then float result just remain float results.
> It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)
>
> And how to adapt my FFI source code?
> Last thing, f2c might also be non standard when returning a complex value
> Big ball of mud...
>