I had quite a bit of fun with trying to achieve bit-identical floating
point behavior across platforms for Croquet. Turns out our claim about
"bit identical" behavior is not so bit-identical when it comes to
floating point after all. Reading up on IEEE754 and some of the more
interesting fpu options like extended internal fpu accuracy or fused
multiply-add (all of which are "consistent" with IEEE754 although
delivering different results) I have found a wonderful little library
(http://www.netlib.org/fdlibm/) which deals very well with these issues.
Using it, I have been able to get truly bit-identical results between
x87 and PPC and I think it likely that other architectures (Sparc, Arm,
Mips) can be made to work with this library as well.
Which raises an interesting question: Should we change the VM so that by
default we *always* use that library? It would mean replacing our
boiler-plate #include<math.h> by an #include<fdlibm.h> and having to
link everything (both the VM as well as external plugins if needed) with
the library, the cost of which is somewhere in the 60kbytes range.
Together with a few platform specific tweaks (like setting the default
accuracy on x87 to 53 bits mantissa, or disabling fused-madd on PPC) it
*should* make floating point behave perfectly bit-identically across
A word about the runtime efficiency: The overhead seems to be fairly
acceptable for the most common uses. Using the primitives, I have been
unable to measure noticable differences between the current primitives
and the fdlibm primitives *except* for the implementation of sqrt(x)
which appears to be significantly faster in hardware. However, both on
x87 and PPC it turns out that the results are bit-identical both across
the architectures as well as in comparison to fdlibm. And that in turn
means we can use the hardware sqrt directly. Note that none of the other
implementations give identical results; in particular when it comes to
boundary or extreme cases (like huge exponents) the results can *vastly*
differ - up to the point that x87 will tell you that sin(1.0e100) is NaN
which obviously ridiculous. fdlibm computes this to -0.38 and I have no
idea if that's to be considered "correct" but at least we get consistent
results across the various platforms.
What do you think? It is worth using fdlibm generally instead of -lm?
PS. If somebody has access to Sparc/Mips/Arm I would be interested to
see how they behave.