Pharo float precision vs Python

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Pharo float precision vs Python

Tim Mackinnon
Hi guys - I recall this came up a few months ago, but I’m curious about the difference of Pharo’s use of Float64 vs Python - as I assumed that if languages use the same IEEE spec (or whatever spec it is) that simple stuff would be quite similar.

I am curious why in Python adding these numbers:

y = 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294
print(y)
print(y / 15)

Gives:

14832.682458496522
988.8454972331015

In pharo I have noticed an anomaly which I thought was precision but it may be something odd with iterators.

y := 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294. y. y / 15.

Gives the same as Python.
BUT:
z := {987.9504418944 . 815.2627636718801 . 1099.3898999037601 . 1021.6996069333198 . 1019.8750146478401 . 1084.5603759764 . 1008.2985131833999 . 1194.9564575200002 . 893.9680444336799 . 1032.85460449136 . 905.9324633786798 . 1024.2805590819598 . 784.5488305664002 . 957.3522631840398 . 1001.7526196294} sum. z. z / 15.

Gives
14832.68245849652
988.8454972331014

Is this correct?


Reply | Threaded
Open this post in threaded view
|

Re: Pharo float precision vs Python

Tim Mackinnon
Actually I can answer my own question - its the difference between #sum and #sumNumbers (and an easy mistake to make - I almost wish that sum was the sumNumbers implementation and there was a sumSample that behaved like now)

On 20 Mar 2020, at 14:52, Tim Mackinnon <[hidden email]> wrote:

Hi guys - I recall this came up a few months ago, but I’m curious about the difference of Pharo’s use of Float64 vs Python - as I assumed that if languages use the same IEEE spec (or whatever spec it is) that simple stuff would be quite similar.

I am curious why in Python adding these numbers:

y = 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294
print(y)
print(y / 15)

Gives:

14832.682458496522
988.8454972331015

In pharo I have noticed an anomaly which I thought was precision but it may be something odd with iterators.

y := 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294. y. y / 15.

Gives the same as Python.
BUT:
z := {987.9504418944 . 815.2627636718801 . 1099.3898999037601 . 1021.6996069333198 . 1019.8750146478401 . 1084.5603759764 . 1008.2985131833999 . 1194.9564575200002 . 893.9680444336799 . 1032.85460449136 . 905.9324633786798 . 1024.2805590819598 . 784.5488305664002 . 957.3522631840398 . 1001.7526196294} sum. z. z / 15.

Gives
14832.68245849652
988.8454972331014

Is this correct?



Reply | Threaded
Open this post in threaded view
|

Re: Pharo float precision vs Python

Tim Mackinnon
Actually this isn’t quite so simple - as the problem outline below compounds itself by the use of #average (which uses #sum and not #sumNumbers) and thus gives a less precise answer.

Why wouldn’t #average use #sumNumbers inside? Or does there need to be #averageNumbers as a complement …. Argggg it makes my head hurt.

Its important as we compare differently to python and this then makes us waste time.

Tim

On 20 Mar 2020, at 15:19, Tim Mackinnon <[hidden email]> wrote:

Actually I can answer my own question - its the difference between #sum and #sumNumbers (and an easy mistake to make - I almost wish that sum was the sumNumbers implementation and there was a sumSample that behaved like now)

On 20 Mar 2020, at 14:52, Tim Mackinnon <[hidden email]> wrote:

Hi guys - I recall this came up a few months ago, but I’m curious about the difference of Pharo’s use of Float64 vs Python - as I assumed that if languages use the same IEEE spec (or whatever spec it is) that simple stuff would be quite similar.

I am curious why in Python adding these numbers:

y = 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294
print(y)
print(y / 15)

Gives:

14832.682458496522
988.8454972331015

In pharo I have noticed an anomaly which I thought was precision but it may be something odd with iterators.

y := 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294. y. y / 15.

Gives the same as Python.
BUT:
z := {987.9504418944 . 815.2627636718801 . 1099.3898999037601 . 1021.6996069333198 . 1019.8750146478401 . 1084.5603759764 . 1008.2985131833999 . 1194.9564575200002 . 893.9680444336799 . 1032.85460449136 . 905.9324633786798 . 1024.2805590819598 . 784.5488305664002 . 957.3522631840398 . 1001.7526196294} sum. z. z / 15.

Gives
14832.68245849652
988.8454972331014

Is this correct?




Reply | Threaded
Open this post in threaded view
|

Re: Pharo float precision vs Python

Sven Van Caekenberghe-2
In reply to this post by Tim Mackinnon
Thanks for sharing, this is indeed something quite subtle.

> On 20 Mar 2020, at 16:19, Tim Mackinnon <[hidden email]> wrote:
>
> Actually I can answer my own question - its the difference between #sum and #sumNumbers (and an easy mistake to make - I almost wish that sum was the sumNumbers implementation and there was a sumSample that behaved like now)

There was a *LOT* of debate about that (especially that #() sum isZero).

I also would would prefer #sum to be like you describe.

>> On 20 Mar 2020, at 14:52, Tim Mackinnon <[hidden email]> wrote:
>>
>> Hi guys - I recall this came up a few months ago, but I’m curious about the difference of Pharo’s use of Float64 vs Python - as I assumed that if languages use the same IEEE spec (or whatever spec it is) that simple stuff would be quite similar.
>>
>> I am curious why in Python adding these numbers:
>>
>> y = 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294
>> print(y)
>> print(y / 15)
>>
>> Gives:
>>
>> 14832.682458496522
>> 988.8454972331015
>>
>> In pharo I have noticed an anomaly which I thought was precision but it may be something odd with iterators.
>>
>> y := 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294.
>> y.
>> y / 15.
>>
>>
>> Gives the same as Python.
>>
>>
>> BUT:
>>
>> z := {987.9504418944 . 815.2627636718801 . 1099.3898999037601  . 1021.6996069333198  . 1019.8750146478401 . 1084.5603759764 . 1008.2985131833999 . 1194.9564575200002 . 893.9680444336799 . 1032.85460449136 . 905.9324633786798 . 1024.2805590819598 . 784.5488305664002 . 957.3522631840398 . 1001.7526196294} sum.
>> z.
>> z / 15.
>>
>>
>> Gives
>> 14832.68245849652
>> 988.8454972331014
>>
>> Is this correct?
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Pharo #sum vs #sumNumbers and the consequence of #average

Tim Mackinnon
I’m interested in exploring this - partly because it hit me and wasted my time chasing down why we have a difference in #sum from other languages. It seems that the implementation is trying to be very clever - I guess from the generic usage of collections which don’t have to contain only numbers.

The kicker for me (from below), was that its not obvious that the #average of a collection of floats loses precision due to #sum.

For me, the generic case is numbers - and looking at senders of #sum they all seem to be expecting numeric summation. So why wouldn’t we put the onus on generic summation back on the caller (I’m assuming checking the collection internally is too expensive to do it automatically?).

If we were to adjust #sum with the implementation of #sumNumbers, and then introduce a new method #sumObjects (and forward #sumNumbers to #sum for partial compatibility ) would all the test cases in the image catch this and give an indication of the consequences?

Could we even consider such a change? Its brave - but shouldn’t Pharo behave like you would expect (or am I missing an obvious use case).

Tim

> On 20 Mar 2020, at 15:24, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Thanks for sharing, this is indeed something quite subtle.
>
>> On 20 Mar 2020, at 16:19, Tim Mackinnon <[hidden email]> wrote:
>>
>> Actually I can answer my own question - its the difference between #sum and #sumNumbers (and an easy mistake to make - I almost wish that sum was the sumNumbers implementation and there was a sumSample that behaved like now)
>
> There was a *LOT* of debate about that (especially that #() sum isZero).
>
> I also would would prefer #sum to be like you describe.
>
>>> On 20 Mar 2020, at 14:52, Tim Mackinnon <[hidden email]> wrote:
>>>
>>> Hi guys - I recall this came up a few months ago, but I’m curious about the difference of Pharo’s use of Float64 vs Python - as I assumed that if languages use the same IEEE spec (or whatever spec it is) that simple stuff would be quite similar.
>>>
>>> I am curious why in Python adding these numbers:
>>>
>>> y = 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294
>>> print(y)
>>> print(y / 15)
>>>
>>> Gives:
>>>
>>> 14832.682458496522
>>> 988.8454972331015
>>>
>>> In pharo I have noticed an anomaly which I thought was precision but it may be something odd with iterators.
>>>
>>> y := 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294.
>>> y.
>>> y / 15.
>>>
>>>
>>> Gives the same as Python.
>>>
>>>
>>> BUT:
>>>
>>> z := {987.9504418944 . 815.2627636718801 . 1099.3898999037601  . 1021.6996069333198  . 1019.8750146478401 . 1084.5603759764 . 1008.2985131833999 . 1194.9564575200002 . 893.9680444336799 . 1032.85460449136 . 905.9324633786798 . 1024.2805590819598 . 784.5488305664002 . 957.3522631840398 . 1001.7526196294} sum.
>>> z.
>>> z / 15.
>>>
>>>
>>> Gives
>>> 14832.68245849652
>>> 988.8454972331014
>>>
>>> Is this correct?
>>>
>>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Sven Van Caekenberghe-2
Like I said, we have had big discussions about this, I am not sure we want to revisit them.

It is not just the problem that summing assumes the elements to be numbers (whatever that it, let's say objects that can be added), it is also the question what the additive identity should be (0 most would say, but for points that would be 0@0, etc), and there is the question what should happen with an empty collection (because we want to avoid testing for that as users). Now, all of this can be solved, the question is what behaviour gets the nicest selectors (#sum being the one that everybody wants).

What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.

In the end, #inject:into: is so flexible that it easily solves most issue at the expense of being more verbose. The fight is over the nicest selectors.

> On 23 Mar 2020, at 13:25, Tim Mackinnon <[hidden email]> wrote:
>
> I’m interested in exploring this - partly because it hit me and wasted my time chasing down why we have a difference in #sum from other languages. It seems that the implementation is trying to be very clever - I guess from the generic usage of collections which don’t have to contain only numbers.
>
> The kicker for me (from below), was that its not obvious that the #average of a collection of floats loses precision due to #sum.
>
> For me, the generic case is numbers - and looking at senders of #sum they all seem to be expecting numeric summation. So why wouldn’t we put the onus on generic summation back on the caller (I’m assuming checking the collection internally is too expensive to do it automatically?).
>
> If we were to adjust #sum with the implementation of #sumNumbers, and then introduce a new method #sumObjects (and forward #sumNumbers to #sum for partial compatibility ) would all the test cases in the image catch this and give an indication of the consequences?
>
> Could we even consider such a change? Its brave - but shouldn’t Pharo behave like you would expect (or am I missing an obvious use case).
>
> Tim
>
>> On 20 Mar 2020, at 15:24, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Thanks for sharing, this is indeed something quite subtle.
>>
>>> On 20 Mar 2020, at 16:19, Tim Mackinnon <[hidden email]> wrote:
>>>
>>> Actually I can answer my own question - its the difference between #sum and #sumNumbers (and an easy mistake to make - I almost wish that sum was the sumNumbers implementation and there was a sumSample that behaved like now)
>>
>> There was a *LOT* of debate about that (especially that #() sum isZero).
>>
>> I also would would prefer #sum to be like you describe.
>>
>>>> On 20 Mar 2020, at 14:52, Tim Mackinnon <[hidden email]> wrote:
>>>>
>>>> Hi guys - I recall this came up a few months ago, but I’m curious about the difference of Pharo’s use of Float64 vs Python - as I assumed that if languages use the same IEEE spec (or whatever spec it is) that simple stuff would be quite similar.
>>>>
>>>> I am curious why in Python adding these numbers:
>>>>
>>>> y = 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294
>>>> print(y)
>>>> print(y / 15)
>>>>
>>>> Gives:
>>>>
>>>> 14832.682458496522
>>>> 988.8454972331015
>>>>
>>>> In pharo I have noticed an anomaly which I thought was precision but it may be something odd with iterators.
>>>>
>>>> y := 987.9504418944 + 815.2627636718801 + 1099.3898999037601 + 1021.6996069333198 + 1019.8750146478401 + 1084.5603759764 + 1008.2985131833999 + 1194.9564575200002 + 893.9680444336799 + 1032.85460449136 + 905.9324633786798 + 1024.2805590819598 + 784.5488305664002 + 957.3522631840398 + 1001.7526196294.
>>>> y.
>>>> y / 15.
>>>>
>>>>
>>>> Gives the same as Python.
>>>>
>>>>
>>>> BUT:
>>>>
>>>> z := {987.9504418944 . 815.2627636718801 . 1099.3898999037601  . 1021.6996069333198  . 1019.8750146478401 . 1084.5603759764 . 1008.2985131833999 . 1194.9564575200002 . 893.9680444336799 . 1032.85460449136 . 905.9324633786798 . 1024.2805590819598 . 784.5488305664002 . 957.3522631840398 . 1001.7526196294} sum.
>>>> z.
>>>> z / 15.
>>>>
>>>>
>>>> Gives
>>>> 14832.68245849652
>>>> 988.8454972331014
>>>>
>>>> Is this correct?
>>>>
>>>>
>>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

jgfoster

> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>
> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.

If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.

James
Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

khinsen
Am 23.03.20 um 14:45 schrieb James Foster:

>> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>
> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.

Or define a singleton class "Zero" with a + method that returns the
other operand, and use that Zero object for the additive identity.

Konrad.


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Sven Van Caekenberghe-2
Both are excellent suggestions.

We have to think a bit about the consequences.

Still, both would not solve the problem of what to return when the collection is empty.

> On 23 Mar 2020, at 15:47, Konrad Hinsen <[hidden email]> wrote:
>
> Am 23.03.20 um 14:45 schrieb James Foster:
>
>>> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>
>>> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
>
> Or define a singleton class "Zero" with a + method that returns the other operand, and use that Zero object for the additive identity.
>
> Konrad.
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

jgfoster

> On Mar 23, 2020, at 8:14 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Both are excellent suggestions.
>
> We have to think a bit about the consequences.
>
> Still, both would not solve the problem of what to return when the collection is empty.

Zero?

>
>> On 23 Mar 2020, at 15:47, Konrad Hinsen <[hidden email]> wrote:
>>
>> Am 23.03.20 um 14:45 schrieb James Foster:
>>
>>>> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>
>>>> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>>> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
>>
>> Or define a singleton class "Zero" with a + method that returns the other operand, and use that Zero object for the additive identity.
>>
>> Konrad.
>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Tim Mackinnon
I’m always impressed with the quality of answers that come out of these discussions - inevitably I’m reminded that dispatching off the right parties is ultimately where the power lies (when you cheat - it always seems to end up with a gotcha).

Thanks guys.

Tim

> On 23 Mar 2020, at 15:15, James Foster <[hidden email]> wrote:
>
>
>> On Mar 23, 2020, at 8:14 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Both are excellent suggestions.
>>
>> We have to think a bit about the consequences.
>>
>> Still, both would not solve the problem of what to return when the collection is empty.
>
> Zero?
>
>>
>>> On 23 Mar 2020, at 15:47, Konrad Hinsen <[hidden email]> wrote:
>>>
>>> Am 23.03.20 um 14:45 schrieb James Foster:
>>>
>>>>> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>>
>>>>> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>>>> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
>>>
>>> Or define a singleton class "Zero" with a + method that returns the other operand, and use that Zero object for the additive identity.
>>>
>>> Konrad.
>>>
>>>
>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Sven Van Caekenberghe-2
https://github.com/pharo-project/pharo/issues/2225

> On 23 Mar 2020, at 17:14, Tim Mackinnon <[hidden email]> wrote:
>
> I’m always impressed with the quality of answers that come out of these discussions - inevitably I’m reminded that dispatching off the right parties is ultimately where the power lies (when you cheat - it always seems to end up with a gotcha).
>
> Thanks guys.
>
> Tim
>
>> On 23 Mar 2020, at 15:15, James Foster <[hidden email]> wrote:
>>
>>
>>> On Mar 23, 2020, at 8:14 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>
>>> Both are excellent suggestions.
>>>
>>> We have to think a bit about the consequences.
>>>
>>> Still, both would not solve the problem of what to return when the collection is empty.
>>
>> Zero?
>>
>>>
>>>> On 23 Mar 2020, at 15:47, Konrad Hinsen <[hidden email]> wrote:
>>>>
>>>> Am 23.03.20 um 14:45 schrieb James Foster:
>>>>
>>>>>> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>>>
>>>>>> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>>>>> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
>>>>
>>>> Or define a singleton class "Zero" with a + method that returns the other operand, and use that Zero object for the additive identity.
>>>>
>>>> Konrad.
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Tim Mackinnon
Thanks for the git issue - and sadly this goes back a long way :(

I’ve added my example to the sad history… is there anyone that can rule on this?

> On 23 Mar 2020, at 21:23, Sven Van Caekenberghe <[hidden email]> wrote:
>
> https://github.com/pharo-project/pharo/issues/2225
>
>> On 23 Mar 2020, at 17:14, Tim Mackinnon <[hidden email]> wrote:
>>
>> I’m always impressed with the quality of answers that come out of these discussions - inevitably I’m reminded that dispatching off the right parties is ultimately where the power lies (when you cheat - it always seems to end up with a gotcha).
>>
>> Thanks guys.
>>
>> Tim
>>
>>> On 23 Mar 2020, at 15:15, James Foster <[hidden email]> wrote:
>>>
>>>
>>>> On Mar 23, 2020, at 8:14 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>
>>>> Both are excellent suggestions.
>>>>
>>>> We have to think a bit about the consequences.
>>>>
>>>> Still, both would not solve the problem of what to return when the collection is empty.
>>>
>>> Zero?
>>>
>>>>
>>>>> On 23 Mar 2020, at 15:47, Konrad Hinsen <[hidden email]> wrote:
>>>>>
>>>>> Am 23.03.20 um 14:45 schrieb James Foster:
>>>>>
>>>>>>> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>>>>>
>>>>>>> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>>>>>> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
>>>>>
>>>>> Or define a singleton class "Zero" with a + method that returns the other operand, and use that Zero object for the additive identity.
>>>>>
>>>>> Konrad.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Richard O'Keefe
In reply to this post by jgfoster
What's wrong with this definition?
    sum
      "(1) This used to use #runsDo:, which can lead to big savings
           on bags and runarrays, but we almost never use this method
           on them, so it didn't pay off.
       (2) The obvious initialisation to 0 doesn't work with a
           collection of Money, Quantities, Durations, and so on.
       (3) Since the receiver is almost always a collection of
           numbers, it would be very bad if #() sum did not answer 0.
           So we should not use #anyOne.
       (4) Using #anyOne to select between algorithms required being
           able to traverse the Enumerable twice, but Enumerable is
           for things you can only traverse once.  Oops.  Good thing
           we want to avoid #anyOne anyway.
       (5) nil does not and should not respondTo: #+ so we can use a
           single variable.  Watch out for this in other summaries."
      |s|
      s := nil.
      self do: [:each |
        s := s ifNil: [each] ifNotNil: [each + s]].
      ^s ifNil: [0] ifNotNil: [s]

On Tue, 24 Mar 2020 at 02:46, James Foster <[hidden email]> wrote:
>
>
> > On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> >
> > What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>
> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
>
> James

Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Richard O'Keefe
In reply to this post by jgfoster
There are situations where #+ and #sum make sense but there is no additive
identity.  Here is an example:
#((1 2 3) (4 5 6) (7 8 9)) sum

On Tue, 24 Mar 2020 at 02:46, James Foster <[hidden email]> wrote:
>
>
> > On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> >
> > What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>
> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
>
> James

Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Sven Van Caekenberghe-2
#(0 0 0) is the additive identity in that case

> On 25 Mar 2020, at 12:38, Richard O'Keefe <[hidden email]> wrote:
>
> There are situations where #+ and #sum make sense but there is no additive
> identity.  Here is an example:
> #((1 2 3) (4 5 6) (7 8 9)) sum
>
> On Tue, 24 Mar 2020 at 02:46, James Foster <[hidden email]> wrote:
>>
>>
>>> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>>
>>> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
>>
>> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
>>
>> James
>


Reply | Threaded
Open this post in threaded view
|

Re: Pharo #sum vs #sumNumbers and the consequence of #average

Richard O'Keefe
Yes, #(0 0 0) is the additive identity in *that* case.
But in #((1 2) (3 4) (5 6)) sum it is #(0 0).
And in #(((1 2 3) (4 5 6)) ((9 8 7) (6 5 4))) sum
==> #(#(10 10 10) #(10 10 10)) it is #((0 0 0) (0 0 0)).
I suppose you could have
SequenceableCollection>>zero
  ^self collect: [:each | each zero]
The point is that it cannot in general be based on the *class* of some
element of the receiver but depends on the *value*.
But then we run into the amusing cases like
{ 1 hour. DateAndTime now} sum
where an additive identity for Durations makes sense but an
additive identity for DateAndTimes does not, and similarly
{Date today. 1} sum
In my case, an additional problem is that I have
Object
  Enumerable (#sum  and #product go here)
    Collection
where Enumerable holds everything that can be defined just using #do:
without assuming that you can do it more than once, so it was necessary
to come up with a streaming definition for #sum that examines each
element once and only once.  When you do that, you discover that you
don't *need* a zero except when the collection is empty, and in that case
you can't *tell* what sort of zero would be appropriate.

On Thu, 26 Mar 2020 at 01:00, Sven Van Caekenberghe <[hidden email]> wrote:

>
> #(0 0 0) is the additive identity in that case
>
> > On 25 Mar 2020, at 12:38, Richard O'Keefe <[hidden email]> wrote:
> >
> > There are situations where #+ and #sum make sense but there is no additive
> > identity.  Here is an example:
> > #((1 2 3) (4 5 6) (7 8 9)) sum
> >
> > On Tue, 24 Mar 2020 at 02:46, James Foster <[hidden email]> wrote:
> >>
> >>
> >>> On Mar 23, 2020, at 6:06 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> >>>
> >>> What you found out now is that the clever trick used to avoid picking an additive identity (picking an element, counting it twice and then subtracting it) leads to a loss of precision when floating point numbers are involved. This is an important issue.
> >>
> >> If this approach is to be preserved, then each class should have an additive identity so instead of adding and subtracting an object, we let the object tell us its zero.
> >>
> >> James
> >
>
>