Smalltalk › Squeak › Squeak - Dev

[squeak-dev] [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

32 messages Options

Tim Olson-4

Re: [squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

On Jan 9, 2009, at 3:08 AM, Paolo Bonzini wrote:

>
>> Is it at all possible to get the x86 FPU to produce the correct result
>> for this situation?
>
> You can also try compiling the Squeak VM with -mfpmath=sse.

That should work, as long as you are running it on x86 processors that
support SSE2 with double-precision scalar operations. The SSE
floating-point unit(s) perform true single- and double-precision
arithmetic.

-- tim

Tim Olson-4

Re: [squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

In reply to this post by Hans-Martin Mosner

On Jan 9, 2009, at 2:29 AM, Hans-Martin Mosner wrote:

> First of all, does it matter? If I understand correctly, this behavior
> is only present for denormalized numbers.

If you set the internal rounding-precision mode in the x86 control
register to double-precision, then yes, the multiple rounding issue
goes away for most computations. It only remains when generating
denormal results because the exponent field size still remains at
extended precision during the computation, so the denormalized result
is only generated during the conversion back to double-precision
format, leading to multiple rounding operations.

> Do these appear in real-world
> cases?

That's hard to say. It might be interesting to instrument the VM to
check for denormal operands or results on the float operations to get a
feel for how often (if ever) they are occurring.

> I've tried to analyze the case in question and came to the following
> results:
> The exact mantissa after multiplication is 3B16EF930A76E.80002C69F96C2
> (the hex digits after the point are those that should be rounded off
> when going to a 52-bit mantissa). The result with "A76F" as the last
> hex
> digits would therefore be the correct value for an IEEE-754 double
> precision multiplication (rounding to the nearest representable
> number),
> so the PPC implementation does it right.
> When doing an extended double precision multiplication, there are some
> more bits in the mantissa, and the mantissa of the intermediate result
> looks like ...A76E.800 which is exacly in the middle between two
> representable numbers. Converting to double precision involves rounding
> the mantissa again, and the rounding rule for this case (exact middle)
> says to round to the nearest even number, which is ...A76E.

That's sort of what is going on, but it is complicated here due to the
way denorms are handled. What is actually happening is:

1st operand (binary):
1.0011001000001000101000100101111000000100111010000111
(note "hidden" 1 bit added to the left of the radix point because it
is a normalized value)

2nd operand (binary):
0.0011000101101101110100011101000000101101000110101110
(note no "hidden" 1 bit because it is a denormalized value)

Before multiplication, the 2nd operand is renormalized, adjusting the
exponent accordingly:
1.1000101101101110100011101000000101101000110101110000

The exact product is:

1.1101100010110111011111001001100001010011101101110100,0000000000000001.
..

The comma (,) is at the double-precision rounding position.

PPC operation:
The product exponent is smaller than can be represented in a normalized
double-precision format, so the result is first denormalized:

0.001101100010110111011111001001100001010011101101110,100000000000000000
1...

Then round-to-nearest rounding adds an ULP because the bits to the
right of the rounding position are greater than halfway:

0.001101100010110111011111001001100001010011101101111

x86 (extended with double-rounding mode):

Because the exponent field is still sized for 80-bit extended floats,
the exact product is still representable as a normalized number:

1.1101100010110111011111001001100001010011101101110100,0000000000000001.
..

Then round-to-nearest drops the bits to the right of the
double-precision rounding point without adding an ULP, because the bits
to the right are less than halfway:

1.1101100010110111011111001001100001010011101101110100

Then the result is converted to a double-precision representation when
storing to memory, which causes it to become denormalized:

0.0011101100010110111011111001001100001010011101101110,100

Round-to-nearest rounding does not add an ULP because the bits to the
right of the rounding position are exactly halfway, and in that case
the nearest even result is selected:

0.0011101100010110111011111001001100001010011101101110

> Is it at all possible to get the x86 FPU to produce the correct result
> for this situation?

Not using the extended-precision FPU, without an extreme performance
loss. You basically would have to cause an exception for all imprecise
results (lots of them!) and check them in software. If you perform the
operations in the SSE unit, then that will work, provided all the x86
platforms you run on support SSE2 with double-precision scalars.

> Interestingly, this article
> (http://www.vinc17.org/research/extended.en.html) claims that the FPU
> is
> set up for incorrect rounding under Linux only, but I could not
> reproduce the test case given there with Squeak (which probably means
> that I mistranslated the Java example).

That example shows the effect for normalized intermediate results when
the intermediate result is rounded to extended precision, then
double-precision rounding occurs when converting to double-precision
format. That can be fixed by setting the rounding-precision mode to
double in the fpu control register, but, as shown in the example above,
it does not work in the case of denormalized intermediate results.

I suspect that your Squeak VM has the rounding-precision mode set to
double, which fixes most of the cases.

-- tim

Tim Olson-4

Re: [squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

In reply to this post by Hans-Martin Mosner

On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:

> AFAIK, the Croquet project needs to have exactly reproducible float
> arithmetics on all platforms.

Are you using the exact same code on all platforms for trancendentals
and other floating-point functions? If not, that is going to be the
greatest source of different results. They tend to vary widely on
accuracy, since there is no 1/2 ULP accuracy requirement for those
functions, unlike for the defined IEEE operations (+,-,*,/,sqrt)

-- Tim Olson

Bert Freudenberg

Re: [squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

On 09.01.2009, at 16:16, Tim Olson wrote:

>
> On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
>
>> AFAIK, the Croquet project needs to have exactly reproducible float
>> arithmetics on all platforms.
>
> Are you using the exact same code on all platforms for
> trancendentals and other floating-point functions? If not, that is
> going to be the greatest source of different results. They tend to
> vary widely on accuracy, since there is no 1/2 ULP accuracy
> requirement for those functions, unlike for the defined IEEE
> operations (+,-,*,/,sqrt)

Yes, Andreas wrote a new plugin to ensure bit-identical floating point
ops across platforms.

- Bert -

Bert Freudenberg

Re: [squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

In reply to this post by Tim Olson-4

On 09.01.2009, at 05:31, Tim Olson wrote:

>
> On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
>
>> AFAIK, the Croquet project needs to have exactly reproducible float
>> arithmetics on all platforms. Would these rounding effects on the x86
>> affect that goal (perhaps with denormal intermediate results)?
>
> Yes.
>
>> Maybe it would be nice to write a unit test which would uncover
>> different rounding behaviors between purely-double-precision FP
>> hardware
>> and extended-precision hardware.
>
> I'd have to play around a bit to get this into a Squeak unit test,
> but here's a test vector which shows the effect:
>
> 3ff3208a25e04e87 * 000316dd1d02d1ae
>
> x86 result: 0003b16ef930a76e
> powerPC result: 0003b16ef930a76f
>
> These are IEEE double-precision floating-point numbers, specified in
> hex (big-endian) so that they are bit exact (no conversion error
> from a decimal representation). The first number is between 1.0 and
> 2.0, while the second number is a denormal value. The product shows
> an lsb difference between an x86 platform (extended-precision FPU)
> and a PPC platform (double-precision FPU), even when the x86
> floating-point control word is set to use double-precision rounding
> for all results.

On my Intel Mac with VM 3.8.17b6 I get ...a76f too, so this is not an
x86 issue.

a := Float newFrom: #(16r3FF3208A 16r25E04E87).
b := Float newFrom: #(16r000316DD 16r1D02D1AE).
{a hex. b hex. (a*b) hex}

#('3FF3208A25E04E87' '000316DD1D02D1AE' '0003B16EF930A76F')

I do get ...a76e under Linux (x86 VMWare emulation).

- Bert -

Edgar J. De Cleene

Re: [squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

In reply to this post by Tim Olson-4

El 1/9/09 10:43 AM, "Tim Olson" <[hidden email]> escribió:

>
> On Jan 9, 2009, at 3:08 AM, Paolo Bonzini wrote:
>
>>
>>> Is it at all possible to get the x86 FPU to produce the correct result
>>> for this situation?
>>
>> You can also try compiling the Squeak VM with -mfpmath=sse.
>
> That should work, as long as you are running it on x86 processors that
> support SSE2 with double-precision scalar operations. The SSE
> floating-point unit(s) perform true single- and double-precision
> arithmetic.
>
> -- tim
>
>

johnmci

Re: [squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

In reply to this post by Bert Freudenberg

years back we coded up a FloatMathPlugin to do IEEE math a front end
to http://www.netlib.org/fdlibm/ in april of 06

Testing showed we could do IEEE math using the plugin on linux/windows/
mac (powerpc).

I believe Qwaq took over building this for macintel.

Also note the unix and carbon VM do fiddle with the FP unit control
register via this code I've a copy of
you would need to check the build tree for the unix version to see if
it is different.

#if defined(__GNUC__) && ( defined(i386) || defined(__i386) ||
defined(__i386__) \
|| defined(i486) || defined(__i486) || defined (__i486__) \
|| defined(intel) || defined(x86) || defined(i86pc) )
static void fldcw(unsigned int cw)
{
__asm__("fldcw %0" :: "m"(cw));
}
#else
# define fldcw(cw)
#endif

#if defined(__GNUC__) && ( defined(ppc) || defined(__ppc) ||
defined(__ppc__) \
|| defined(POWERPC) || defined(__POWERPC) || defined (__POWERPC__) )
void mtfsfi(unsigned long long fpscr);
void mtfsfi(unsigned long long fpscr)
{
__asm__("lfd f0, %0" :: "m"(fpscr));
__asm__("mtfsf 0xff, f0");
}
#else
# define mtfsfi(fpscr)
#endif

On 9-Jan-09, at 7:20 AM, Bert Freudenberg wrote:

>
> On 09.01.2009, at 16:16, Tim Olson wrote:
>
>>
>> On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
>>
>>> AFAIK, the Croquet project needs to have exactly reproducible float
>>> arithmetics on all platforms.
>>
>> Are you using the exact same code on all platforms for
>> trancendentals and other floating-point functions? If not, that is
>> going to be the greatest source of different results. They tend to
>> vary widely on accuracy, since there is no 1/2 ULP accuracy
>> requirement for those functions, unlike for the defined IEEE
>> operations (+,-,*,/,sqrt)
>
>
> Yes, Andreas wrote a new plugin to ensure bit-identical floating
> point ops across platforms.
>
> - Bert -
>
>
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

Joshua Gargus-2

Re: [squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

In reply to this post by Tim Olson-4

Tim Olson wrote:
>
> On Jan 8, 2009, at 9:30 AM, Hans-Martin Mosner wrote:
>
>> AFAIK, the Croquet project needs to have exactly reproducible float
>> arithmetics on all platforms.

Slightly off-topic, but since we have such a knowledgeable crowd...

As noted by John, Croquet uses fdlibm for bit-identical floating point
math. Does anyone have a feeling for how difficult (or impossible) it
will be to achieve identical computation on OpenCL-compliant devices?
For example, how difficult would it be to port, say, fdlibm, so that
trancedentals use the exact same code? Any other show-stoppers that
might not occur to the naive mind :-) ?

Thanks,
Josh

>
> Are you using the exact same code on all platforms for trancendentals
> and other floating-point functions? If not, that is going to be the
> greatest source of different results. They tend to vary widely on
> accuracy, since there is no 1/2 ULP accuracy requirement for those
> functions, unlike for the defined IEEE operations (+,-,*,/,sqrt)
>
> -- Tim Olson
>
>

Paolo Bonzini-2

[squeak-dev] Re: [ANN] Number comparison, hash, NaN, Point, and other partially ordered sets

In reply to this post by Bert Freudenberg

>> 3ff3208a25e04e87 * 000316dd1d02d1ae
>> x86 result: 0003b16ef930a76e
>> powerPC result: 0003b16ef930a76f
>
> On my Intel Mac with VM 3.8.17b6 I get ...a76f too, so this is not an
> x86 issue.

Mac OS X uses SSE by default for math. It is an *x87* issue, not x86.

Paolo

Bert Freudenberg

[squeak-dev] OpenCL

In reply to this post by Joshua Gargus-2

On 10.01.2009, at 11:01, Josh Gargus wrote:
> As noted by John, Croquet uses fdlibm for bit-identical floating
> point math. Does anyone have a feeling for how difficult (or
> impossible) it will be to achieve identical computation on OpenCL-
> compliant devices?

The numerical behavior of compliant OpenCL implementations is covered
in section 7 of the OpenCL spec. In particular, table 7.1 gives the
error bounds for the various operations. If I interpret that
correctly, very few functions are guaranteed to behave bit-identical.

> For example, how difficult would it be to port, say, fdlibm, so
> that trancedentals use the exact same code? Any other show-stoppers
> that might not occur to the naive mind :-) ?

Well OpenCL only requires single-precision, double-precision support
is optional, whereas fdlibm is double-precision only. I don't know how
compliant current implementations actually are.

- Bert -

Joshua Gargus-2

Re: [squeak-dev] OpenCL

Bert Freudenberg wrote:

> On 10.01.2009, at 11:01, Josh Gargus wrote:
>> As noted by John, Croquet uses fdlibm for bit-identical floating
>> point math. Does anyone have a feeling for how difficult (or
>> impossible) it will be to achieve identical computation on
>> OpenCL-compliant devices?
>
> The numerical behavior of compliant OpenCL implementations is covered
> in section 7 of the OpenCL spec. In particular, table 7.1 gives the
> error bounds for the various operations. If I interpret that
> correctly, very few functions are guaranteed to behave bit-identical.

Oops, I was skimming by the time I read that part of the spec. I saw
that the transcendental functions need not return identical results
(which was why I mentioned porting fdlibm), but I missed that even x/y
isn't precisely specified.

>
>> For example, how difficult would it be to port, say, fdlibm, so
>> that trancedentals use the exact same code? Any other show-stoppers
>> that might not occur to the naive mind :-) ?
>
> Well OpenCL only requires single-precision, double-precision support
> is optional, whereas fdlibm is double-precision only.

I didn't know that. Maybe that's what the "d" stands for in "fdlibm".

> I don't know how compliant current implementations actually are.

I think that there's only one implementation right now, and only
available to paid-up Apple developers. Anyway, I'm more interested in
what the spec says than current conformance... implementations will
gradually become more compliant.

Thanks,
Josh

> - Bert -
>
>
>

Joshua Gargus-2

Re: [squeak-dev] OpenCL

Reading the CUDA 2.0 programming guide, I saw an interesting difference
between the error-bounds for single- and double-precision floating point:

Double-precision
Operation ULPs
x+y 0 (IEEE-754 round-to-nearest-even)
x*y 0 (IEEE-754 round-to-nearest-even)
x/y 0 (IEEE-754 round-to-nearest-even)
1/x 0 (IEEE-754 round-to-nearest-even)
sqrt(x) 0 (IEEE-754 round-to-nearest-even)

Single-precision
Operation ULPs
x+y 0 (IEEE-754 round-to-nearest-even)
x*y 0 (IEEE-754 round-to-nearest-even)
x/y 2
1/x 1
sqrt(x) 3

So, there might be some hope, at least for double-precision ops.
Currently, AFAIK, the latest NVIDIA GPUSs are the only ones with
double-precision FP support. But, things are changing rapidly in this
area: in about a year, Intel will release Larrabee, which will "fully
support IEEE standards for single and double precision floating-point
arithmetic". Hopefully this forces the other vendors to follow suit,
and future OpenCL revisions reflect this.

Cheers,
Josh

Josh Gargus wrote:

> Bert Freudenberg wrote:
>> On 10.01.2009, at 11:01, Josh Gargus wrote:
>>> As noted by John, Croquet uses fdlibm for bit-identical floating
>>> point math. Does anyone have a feeling for how difficult (or
>>> impossible) it will be to achieve identical computation on
>>> OpenCL-compliant devices?
>>
>> The numerical behavior of compliant OpenCL implementations is covered
>> in section 7 of the OpenCL spec. In particular, table 7.1 gives the
>> error bounds for the various operations. If I interpret that
>> correctly, very few functions are guaranteed to behave bit-identical.
>
> Oops, I was skimming by the time I read that part of the spec. I saw
> that the transcendental functions need not return identical results
> (which was why I mentioned porting fdlibm), but I missed that even x/y
> isn't precisely specified.
>
>>
>>> For example, how difficult would it be to port, say, fdlibm, so
>>> that trancedentals use the exact same code? Any other show-stoppers
>>> that might not occur to the naive mind :-) ?
>>
>> Well OpenCL only requires single-precision, double-precision support
>> is optional, whereas fdlibm is double-precision only.
>
> I didn't know that. Maybe that's what the "d" stands for in "fdlibm".
>
>> I don't know how compliant current implementations actually are.
>
> I think that there's only one implementation right now, and only
> available to paid-up Apple developers. Anyway, I'm more interested in
> what the spec says than current conformance... implementations will
> gradually become more compliant.
>
> Thanks,
> Josh
>
>
>> - Bert -
>>
>>
>>
>
>