Number operations and comparisons

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Number operations and comparisons

Jan Weerts
Hi all,

accidentally I stumbled across some problems in the Magnitude area and
wonder why these are there.

These doits all break in a way or another, which I would not expect

- 1e39 inspect
no calculation actually, just not representable as Float while easily
representable as LargeInteger or Double. While there seems to be no
problem to jump from SmallInteger to LargeInteger, I do not
understand, why a Double would not be a valid return value.

- 1d309 inspect
fails similarly, still representable as LargeInteger

- (10 raisedTo: 40) < 1e6
the LargeInteger is about to be transformed to a Float in
Float>>lessFromInteger: but no Float could ever be bigger than this.
My guess is, that more double dispatching to the Integer and its
subclasses would help a lot.

- (10 raisedTo: 40) log
fails, because it takes a detour to Float. There are a lot more
functions like that, typically involving sends of
#asLimitedPrecisionReal, which in turn only covers Float and not
Double. For my taste either a Double coercion might help (pushing the
limits further) or subclass implementations for LargeInteger are
missing here.

Regards
   Jan
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Number operations and comparisons

Andres Valloud-6
Let's see...

1e39 inspect.  As per the syntax, 1e39 is supposed to be a float.
However, the value specified in float syntax is beyond the reach of
floats.  Something similar happens with 1d309.  Check out Float
class>>fmax and Double class>>fmax for the largest possible
representable values.

(10 raisedTo: 40) < 1e6.  This works in 7.7.1.  Other comparisons were
broken similarly, but that is no longer the case :).

(10 raisedTo: 40) log.  This one still breaks because the default
precision for floating point is single precision.  Changing
asLimitedPrecisionReal is not that simple, though.  If you want to use
doubles, you might want to try something like (10.0d raisedTo: 40) log.

Andres.


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Jan Weerts
Sent: Friday, May 28, 2010 12:15 AM
To: vwnc list
Subject: [vwnc] Number operations and comparisons

Hi all,

accidentally I stumbled across some problems in the Magnitude area and
wonder why these are there.

These doits all break in a way or another, which I would not expect

- 1e39 inspect
no calculation actually, just not representable as Float while easily
representable as LargeInteger or Double. While there seems to be no
problem to jump from SmallInteger to LargeInteger, I do not understand,
why a Double would not be a valid return value.

- 1d309 inspect
fails similarly, still representable as LargeInteger

- (10 raisedTo: 40) < 1e6
the LargeInteger is about to be transformed to a Float in
Float>>lessFromInteger: but no Float could ever be bigger than this.
My guess is, that more double dispatching to the Integer and its
subclasses would help a lot.

- (10 raisedTo: 40) log
fails, because it takes a detour to Float. There are a lot more
functions like that, typically involving sends of
#asLimitedPrecisionReal, which in turn only covers Float and not Double.
For my taste either a Double coercion might help (pushing the limits
further) or subclass implementations for LargeInteger are missing here.

Regards
   Jan
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Number operations and comparisons

Jan Weerts
Hi Andres!

I somehow expected, that you would respond to this :)

Valloud, Andres wrote:
> 1e39 inspect.  As per the syntax, 1e39 is supposed to be a float.
> However, the value specified in float syntax is beyond the reach of
> floats.  Something similar happens with 1d309.  Check out Float
> class>>fmax and Double class>>fmax for the largest possible
> representable values.

I am quite aware of the syntax and its implications. My point is, that
it does not have to fail, because the exact same number is still
representable as some other classes object. And since this is not Java
and we already juggle Integers and LargeIntegers in one go, why is it
different for limited precision reals? Well I suppose somewhere in an
X-colored book there is the root of this *insert appropriate term here*.

> (10 raisedTo: 40) < 1e6.  This works in 7.7.1.  Other comparisons were
> broken similarly, but that is no longer the case :).

nice, gimme.

> (10 raisedTo: 40) log.  This one still breaks because the default
> precision for floating point is single precision.  Changing
> asLimitedPrecisionReal is not that simple, though.  If you want to use
> doubles, you might want to try something like (10.0d raisedTo: 40) log.

... and fail again at a higher #raisedTo: argument?
My point here was, that it would be quite easy to implement a lot of
those functions for LargeInteger because the #asLimitedPrecisionReal
feels like a crutch to avoid implementing it for these type of objects.

I forgot to mention, why I posted this: we tested some input
validation code (and users can be quite inventive) and found out about
the XeY problem of the number parser. That's where it took off to some
other tests in that direction.

Regards
   Jan

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Number operations and comparisons

Andres Valloud-6
In general terms, I am suspicious that the root of the insertable term
is the desire not to guess what the developer meant because invariably
some guess will be wrong.  And then, well... beware of the developers'
wrath!  Thus, let's not guess.  It may seem limited.  But, also, it's
consistent and predictable.

Regarding implementing log and other functions for large integers... I
think part of the problem is that implementing fully precise algorithms
that run very quickly is not that easy for arbitrary precision
arguments.  For example, how much would you be willing to wait for this?

(10 raisedTo: 400) log: 7

Should the answer be a double of a fixed point?  How many decimals
should you get?  What happens if you want this instead?

(10 raisedTo: 400) log: Double pi

To what precision should the calculations be carried out?  Who sets the
default?  I think sqrtFloor and sqrtRounded are generally useful for
large integers.  Fortunately, in the case of logs, usually you can
either approximate or use the number's own construction to figure out
its logarithm in some base, e.g.: log_7 (10^400) = 400 * log_7 10, or

LargePositiveInteger>>log

  self <= Double fmax truncated ifTrue: [^self asDouble log].
  ^Double fmax log + (self // Double fmax truncated) log

But is the potential for accumulated error acceptable in the above
method?  It is not possible to guess how arbitrary applications will
tolerate errors in the above method.  Similarly, it is not possible to
guess how arbitrary applications will tolerate performance penalties due
to the desire to get even the very last bit right (never mind what
happens when the result is irrational and thus requires infinite
decimals to express correctly).  What should be done?  Guess, and thus
run the risk of guessing wrong?  Maybe we just need a loadable package
with some of these things implemented in some way so that others can
extend / refine the functionality for their own purposes.

Andres.

-----Original Message-----
From: Jan Weerts [mailto:[hidden email]]
Sent: Friday, May 28, 2010 7:29 AM
To: Valloud, Andres
Cc: [hidden email]
Subject: Re: [vwnc] Number operations and comparisons

Hi Andres!

I somehow expected, that you would respond to this :)

Valloud, Andres wrote:
> 1e39 inspect.  As per the syntax, 1e39 is supposed to be a float.
> However, the value specified in float syntax is beyond the reach of
> floats.  Something similar happens with 1d309.  Check out Float
> class>>fmax and Double class>>fmax for the largest possible
> representable values.

I am quite aware of the syntax and its implications. My point is, that
it does not have to fail, because the exact same number is still
representable as some other classes object. And since this is not Java
and we already juggle Integers and LargeIntegers in one go, why is it
different for limited precision reals? Well I suppose somewhere in an
X-colored book there is the root of this *insert appropriate term here*.

> (10 raisedTo: 40) < 1e6.  This works in 7.7.1.  Other comparisons were

> broken similarly, but that is no longer the case :).

nice, gimme.

> (10 raisedTo: 40) log.  This one still breaks because the default
> precision for floating point is single precision.  Changing
> asLimitedPrecisionReal is not that simple, though.  If you want to use

> doubles, you might want to try something like (10.0d raisedTo: 40)
log.

... and fail again at a higher #raisedTo: argument?
My point here was, that it would be quite easy to implement a lot of
those functions for LargeInteger because the #asLimitedPrecisionReal
feels like a crutch to avoid implementing it for these type of objects.

I forgot to mention, why I posted this: we tested some input validation
code (and users can be quite inventive) and found out about the XeY
problem of the number parser. That's where it took off to some other
tests in that direction.

Regards
   Jan


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Number operations and comparisons

Nicolas Cellier
Valloud, Andres <avalloud <at> cincom.com> writes:

>
> snip.. (gmane oblige)
>
> LargePositiveInteger>>log
>
>   self <= Double fmax truncated ifTrue: [^self asDouble log].
>   ^Double fmax log + (self // Double fmax truncated) log
>
> But is the potential for accumulated error acceptable in the above
> method?  It is not possible to guess how arbitrary applications will
> tolerate errors in the above method.  Similarly, it is not possible to
> guess how arbitrary applications will tolerate performance penalties due
> to the desire to get even the very last bit right (never mind what
> happens when the result is irrational and thus requires infinite
> decimals to express correctly).  What should be done?  Guess, and thus
> run the risk of guessing wrong?  Maybe we just need a loadable package
> with some of these things implemented in some way so that others can
> extend / refine the functionality for their own purposes.
>
> Andres.
>

Hi again,
For fun, I just put a patch in squeak inbox similar to your proposition.
But I don't know if it is usefull indeed.
For example, what would we expect here:
   (10 raisedTo: 40) ln exp

Concerning accuracy, isn't intel x86 log already up to 3 ulp off?
So maybe LargeInteger>>log won't be much worse.
Note that naive algorithm can be very bad wrt accuracy for Fraction due to
gradual underflow.

cheers

Nicolas

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Number operations and comparisons

Andres Valloud-6
Generally speaking, the problem with this type of methods is that they
are only suitable once you know the context of application.  However,
from the POV of a language vendor, we do not know the context of
application.  Thus, if we did something, we'd be forced to guess, and
then we would eventually guess wrong.  So, it is better not to guess.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of nicolas cellier
Sent: Friday, May 28, 2010 12:57 PM
To: [hidden email]
Subject: Re: [vwnc] Number operations and comparisons

Valloud, Andres <avalloud <at> cincom.com> writes:

>
> snip.. (gmane oblige)
>
> LargePositiveInteger>>log
>
>   self <= Double fmax truncated ifTrue: [^self asDouble log].
>   ^Double fmax log + (self // Double fmax truncated) log
>
> But is the potential for accumulated error acceptable in the above
> method?  It is not possible to guess how arbitrary applications will
> tolerate errors in the above method.  Similarly, it is not possible to

> guess how arbitrary applications will tolerate performance penalties
> due to the desire to get even the very last bit right (never mind what

> happens when the result is irrational and thus requires infinite
> decimals to express correctly).  What should be done?  Guess, and thus

> run the risk of guessing wrong?  Maybe we just need a loadable package

> with some of these things implemented in some way so that others can
> extend / refine the functionality for their own purposes.
>
> Andres.
>

Hi again,
For fun, I just put a patch in squeak inbox similar to your proposition.
But I don't know if it is usefull indeed.
For example, what would we expect here:
   (10 raisedTo: 40) ln exp

Concerning accuracy, isn't intel x86 log already up to 3 ulp off?
So maybe LargeInteger>>log won't be much worse.
Note that naive algorithm can be very bad wrt accuracy for Fraction due
to gradual underflow.

cheers

Nicolas

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Number operations and comparisons

Jan Weerts
In reply to this post by Andres Valloud-6
Hi Andres!

Valloud, Andres wrote:
> In general terms, I am suspicious that the root of the insertable term
> is the desire not to guess what the developer meant because invariably
> some guess will be wrong.  And then, well... beware of the developers'
> wrath!  Thus, let's not guess.  It may seem limited.  But, also, it's
> consistent and predictable.

Sorry about the previous verbal fit. But for me consistency is what I
argue for and do not see in the implementation. The objects are all
Numbers, yet behave differently depending on their class/value. #log
is implemented at Number so I would expect it to work for all Numbers.

Predictability is only given in case the "wrathful" developers know
all hinges of the implementation and catches all possible errors in
places where a quick glance at the location of implementation would
not have predicted a possible failure path.

> Regarding implementing log and other functions for large integers... I
> think part of the problem is that implementing fully precise algorithms
> that run very quickly is not that easy for arbitrary precision
> arguments.  For example, how much would you be willing to wait for this?
>
> (10 raisedTo: 400) log: 7

Well I do know that you know a lot more about mathematics than I do,
but I thought a little bit of bit copy operations on the binary
representation of a LargePositiveInteger to convert it to a Doubles
mantissa plus some basic log-mathematics would have helped. I'll give
it a try, when I find the time. And possibly, when I want to handle
large values I might be inclined to wait a little longer than usual.

> Should the answer be a double of a fixed point?  

Well a FixedPoint right now returns a Float, so keep it that way until
it overflows to Double :)

I read all of your questions and know, that one really has to think
about these problems, if something is going to happen in that area. I
am not in the position to find all valid use cases, I just commented
on the problems we encountered while parsing user input. The
application domain we typically handle does not care about huge
precisions and calculation efficiency on reals is also not a big deal,
so I expect e.g. Nicolas to be not very fond of the requirements I
would specify.
But in the interest of our customers are stable applications. And in
this area we (developers) found out (luckily while testing) that the
given library throws errors in places we did not expect. Hence the mail.

 > Maybe we just need a loadable package
> with some of these things implemented in some way so that others can
> extend / refine the functionality for their own purposes.

That might be an idea.

Thanks for your feedback
   Jan

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: Number operations and comparisons

Andres Valloud-6
I think the consistency we have right now is that everything except
Doubles goes through Float because that's what asLimitedPrecisionReal
does.  I understand it may not be the kind of consistency you want,
though :).  Unfortunately, changing it is not that easy because of the
implications.  We might have to wait until 8.0 to do that, because then
we get a license to potentially break backwards compatibility.  And
perhaps it's not a good idea at all, imagine tests written like self
assert: -1 sin = Float pi, or self assert: resultOfNumericalCalculations
= someResultThatHappenedWithThePreviousCode, or you load old data from
the database and you find that you can't reproduce the old results, etc.

Roughly speaking, it's not clear what the behavior of
veryLargePositiveInteger log should be.  For example,

> Well I do know that you know a lot more about mathematics than I do,
but I thought a little bit of bit copy operations on the binary
representation of a LargePositiveInteger to convert it to a Doubles
mantissa plus some basic log-mathematics would have helped.

Well but see, the problem is that to get an accurate result you may need
to calculate log 2 to a lot of decimals so that you can do things like

veryLargePositiveInteger log =
  = (2^k * smallerLargePositiveInteger) log
  = k log 2 + smallerLargePositiveInteger log

with some degree of accuracy.  What degree of accuracy are we talking
about?  What if a double does not have enough significant digits so when
you do 10^(veryLargePositiveInteger log) you get
veryLargePositiveInteger back?  Since veryLargePositiveInteger does not
fit in a double, then should we also implement integer (and fraction,
fixed point, etc) e^x?  To what precision?  With what kind of
performance impact?  How do we know what is a reasonable answer in the
absence of actual use cases?

> And possibly, when I want to handle large values I might be inclined
to wait a little longer than usual.

The problem is that, generally speaking, it doesn't take a "little"
longer.  Rather, it takes MUCH longer.  I'd expect that just emulating
what a modern FPU is capable of in software, without using floating
point operations, would execute at least a good 3-5x slower, if not much
worse.  I'd be happy predicting that doing the same with arbitrary
precision will become slower in proportion to the input sizes (if not
worse, like in proportion to the square of the input sizes).  So, doing
an operation with 10^400 should take at least ~100 times more time than
the same operation with a double (10^400 has 1200 bits, which is >20
times larger than double's 52/53 bit mantissas, and then we have to
apply another 3-5x performance penalty factor on top of that, and we're
still assuming linear performance deterioration which I suspect is very
optimistic, and we're also assuming the implementation of arbitrary
precision arithmetic in Smalltalk will be as fast as in C, etc).

So then we should compromise somewhat.  But how much can we compromise
when we do not know what customers need?  Or, perhaps more precisely,
how much should we compromise when different customers will have
completely different needs?  The problem here is that no matter what we
do, I don't think we can really make all customers happy by changing the
default implementation for everybody this time.  Hence the loadable
packages approach.  Now, if there are things that make the loadable
packages approach unnecessarily difficult, I'd even risk it and say the
obstacles should be made easier to deal with.

Andres.

-----Original Message-----
From: Jan Weerts [mailto:[hidden email]]
Sent: Monday, May 31, 2010 2:00 AM
To: Valloud, Andres
Cc: [hidden email]
Subject: Re: [vwnc] Number operations and comparisons

Hi Andres!

Valloud, Andres wrote:
> In general terms, I am suspicious that the root of the insertable term

> is the desire not to guess what the developer meant because invariably

> some guess will be wrong.  And then, well... beware of the developers'
> wrath!  Thus, let's not guess.  It may seem limited.  But, also, it's
> consistent and predictable.

Sorry about the previous verbal fit. But for me consistency is what I
argue for and do not see in the implementation. The objects are all
Numbers, yet behave differently depending on their class/value. #log is
implemented at Number so I would expect it to work for all Numbers.

Predictability is only given in case the "wrathful" developers know all
hinges of the implementation and catches all possible errors in places
where a quick glance at the location of implementation would not have
predicted a possible failure path.

> Regarding implementing log and other functions for large integers... I

> think part of the problem is that implementing fully precise
> algorithms that run very quickly is not that easy for arbitrary
> precision arguments.  For example, how much would you be willing to
wait for this?
>
> (10 raisedTo: 400) log: 7

Well I do know that you know a lot more about mathematics than I do, but
I thought a little bit of bit copy operations on the binary
representation of a LargePositiveInteger to convert it to a Doubles
mantissa plus some basic log-mathematics would have helped. I'll give it
a try, when I find the time. And possibly, when I want to handle large
values I might be inclined to wait a little longer than usual.

> Should the answer be a double of a fixed point?  

Well a FixedPoint right now returns a Float, so keep it that way until
it overflows to Double :)

I read all of your questions and know, that one really has to think
about these problems, if something is going to happen in that area. I am
not in the position to find all valid use cases, I just commented on the
problems we encountered while parsing user input. The application domain
we typically handle does not care about huge precisions and calculation
efficiency on reals is also not a big deal, so I expect e.g. Nicolas to
be not very fond of the requirements I would specify.
But in the interest of our customers are stable applications. And in
this area we (developers) found out (luckily while testing) that the
given library throws errors in places we did not expect. Hence the mail.

 > Maybe we just need a loadable package
> with some of these things implemented in some way so that others can
> extend / refine the functionality for their own purposes.

That might be an idea.

Thanks for your feedback
   Jan


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc