[squeak-dev] Re:Collection>>sum implementation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re:Collection>>sum implementation

Jerome Peace
[squeak-dev] Collection>>sum implementation Jason Johnson jason.johnson.081 at gmail.com Mon Aug 11 11:57:54 UTC 2008 Hi Jason, I remember debugging an instance of this problem a while back. Ah found it: http://bugs.squeak.org/view.php?id=6560 [BUG][FIX] "{Color red. Color green. Color blue} sum" can't return "Color white" The solution was to inject the zero element then sum the whole list. How do you get a zero element when you don't know what you are dealing with? You let it tell you: any := myCollection anyone . zeroElement := any - any . answer := myCollection inject: zeroElement into: [ :sum :each | sum + each ] . The mantis report needs to be harvested. Hth, Yours in curiosity and service, --Jerome Peace *** Jason>Hello all, Jason> Jason>In my explorations today I happened on the implementation of the #sum Jason>method: Jason> Jason>Collection>>sum Jason> "This is implemented using a variant of the normal inject:into: pattern. Jason> The reason for this is that it is not known whether we're in the normal Jason> number line, i.e. whether 0 is a good initial value for the sum. Jason> Consider a collection of measurement objects, 0 would be the unitless Jason> value and would not be appropriate to add with the unit-ed objects." Jason> | sum sample | Jason> sample _ self anyOne. Jason> sum _ self inject: sample into: [:accum :each | accum + each]. Jason> ^ sum - sample Jason> Jason> Jason>Personally, I view the code in the core image to, among other things, Jason>serve the role of showing new Smalltalk programmers good ways of using Jason>the language. This is exactly the kind of thing I would *not* want Jason>people to see. Jason> Jason>The obvious issue here is that #inject:into: doesn't fit this case Jason>very well. But rather then go to these kinds of steps a much better Jason>solution would be to simply notice that a specialized variation of Jason>#inject:into: (that is in fact quite common) is needed here. Namely Jason>a reduction method that just uses the first element as the starting Jason>value, conceptually: Jason> Jason>Collection>>reduce: aBlock Jason> ^ self allButFirst inject: self first into: aBlock Jason> Jason>As it turns out there exists a more efficient implementation of this Jason>method in Lukas' Magritte-Model package. I would propose his method Jason>be promoted to a core method so #sum could be rewritten as: Jason> Jason>sum Jason> "sum the reciever" Jason> ^ self reduce: [:accum :each| accum + each ] Jason> Jason>Thanks, Jason>Jason ***



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re:Collection>>sum implementation

Randal L. Schwartz
>>>>> "Jerome" == Jerome Peace <[hidden email]> writes:

Jerome> You let it tell you:

Jerome> any := myCollection anyone  .
Jerome> zeroElement := any - any .

This presumes that anything that implements #+ is also expected
to implement #-.  Yes, that was also present in the original implementation,
but a version that doesn't require that would be nice.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Collection>>sum implementation

Klaus D. Witzel
On Mon, 11 Aug 2008 22:47:25 +0200, Randal L. Schwartz wrote:

>>>>>> "Jerome" == Jerome Peace <[hidden email]> writes:
>
> Jerome> You let it tell you:
>
> Jerome> any := myCollection anyone  .
> Jerome> zeroElement := any - any .

Problem here is that #() has no anyone, and so #negated (assuming that sum  
wants to do #+) can also not be used.

> This presumes that anything that implements #+ is also expected
> to implement #-.  Yes, that was also present in the original  
> implementation,
> but a version that doesn't require that would be nice.

Then use an object that for #+ returns a copy of the argument (no I'm not  
mentioning nil :)

  ^ myCollection inject: mysticalObject into: [:a :b | a + b]

It *must* return some mysticalObject when myCollection is empty since one  
cannot know the species of elements which are not in a collection.

This could be called an undefined object which understands #+ ;) clearly  
*not* nil.

Such things can be avoided be passing the desired zero element to  
#sumFromZero: and checking the answer, so that the sender can decide what  
its own zero means for an empty collection.

But if you want #sum rather than #sumFromZero: then for the numerical case  
it sufficies to do

  mySalarySum := myBigSalaryVector sum + 0

, sufficient because mysticalObject, assuming that's what #sum can answer  
for the empty collection, returns a copy of the extra 0 for the extra #+ ;)

And a comment in #sum's method can tell how to use it properly.

/Klaus


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re:Collection>>sum implementation

Jason Johnson-5
In reply to this post by Jerome Peace
On Mon, Aug 11, 2008 at 10:15 PM, Jerome Peace
<[hidden email]> wrote:

> [squeak-dev] Collection>>sum implementation Jason Johnson jason.johnson.081
> at gmail.com Mon Aug 11 11:57:54 UTC 2008 Hi Jason, I remember debugging an
> instance of this problem a while back. Ah found it:
> http://bugs.squeak.org/view.php?id=6560 [BUG][FIX] "{Color red. Color green.
> Color blue} sum" can't return "Color white" The solution was to inject the
> zero element then sum the whole list. How do you get a zero element when you
> don't know what you are dealing with? You let it tell you: any :=
> myCollection anyone . zeroElement := any - any . answer := myCollection
> inject: zeroElement into: [ :sum :each | sum + each ] . The mantis report
> needs to be harvested. Hth, Yours in curiosity and service, --Jerome Peace
> *** Jason>Hello all, Jason> Jason>In my explorations today I happened on the
> implementation of the #sum Jason>method: Jason> Jason>Collection>>sum Jason>
> "This is implemented using a variant of the normal inject:into: pattern.
> Jason> The reason for this is that it is not known whether we're in the
> normal Jason> number line, i.e. whether 0 is a good initial value for the
> sum. Jason> Consider a collection of measurement objects, 0 would be the
> unitless Jason> value and would not be appropriate to add with the unit-ed
> objects." Jason> | sum sample | Jason> sample _ self anyOne. Jason> sum _
> self inject: sample into: [:accum :each | accum + each]. Jason> ^ sum -
> sample Jason> Jason> Jason>Personally, I view the code in the core image to,
> among other things, Jason>serve the role of showing new Smalltalk
> programmers good ways of using Jason>the language. This is exactly the kind
> of thing I would *not* want Jason>people to see. Jason> Jason>The obvious
> issue here is that #inject:into: doesn't fit this case Jason>very well. But
> rather then go to these kinds of steps a much better Jason>solution would be
> to simply notice that a specialized variation of Jason>#inject:into: (that
> is in fact quite common) is needed here. Namely Jason>a reduction method
> that just uses the first element as the starting Jason>value, conceptually:
> Jason> Jason>Collection>>reduce: aBlock Jason> ^ self allButFirst inject:
> self first into: aBlock Jason> Jason>As it turns out there exists a more
> efficient implementation of this Jason>method in Lukas' Magritte-Model
> package. I would propose his method Jason>be promoted to a core method so
> #sum could be rewritten as: Jason> Jason>sum Jason> "sum the reciever"
> Jason> ^ self reduce: [:accum :each| accum + each ] Jason> Jason>Thanks,
> Jason>Jason ***

The formatting of your message got really trashed in gmail so I can't
be sure I understand your response, but if it is the way I read it
then I think you didn't read my message.

It is absolutely not necessary to do the inefficient:  (1) insert one
element of the receiver, (2) sum the receiver (NOTE: one element is
processed twice), (3) take the "sample" back out again.

For me this situation brings up a typical Smalltalk situation where
the code is literally telling us it that is requires a special case.
Namely a method that preforms a reduction on the receiver, but without
requiring an "initial value" to inject.  And as is often the case,
once we have this method we find it is generally useful (as evidence
by the fact that Lukas make his own).

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Collection>>sum implementation

Andreas.Raab
In reply to this post by Klaus D. Witzel
Klaus D. Witzel wrote:
> Then use an object that for #+ returns a copy of the argument (no I'm
> not mentioning nil :)
>
>  ^ myCollection inject: mysticalObject into: [:a :b | a + b]
>
> It *must* return some mysticalObject when myCollection is empty since
> one cannot know the species of elements which are not in a collection.

nil is actually a fine "mysticalObject" and it arises naturally in the
implementation of #sum that I would favor:

Collection>>sum
     "Answers the sum of the elements in the receiver, nil if empty"
     | sum |
     self do:[:value|
         sum := sum ifNil:[value] ifNotNil:[sum + value].
     ].
     ^sum

> But if you want #sum rather than #sumFromZero: then for the numerical
> case it sufficies to do
>
>  mySalarySum := myBigSalaryVector sum + 0

With the above this becomes:

   mySalarySum := myBigSalaryVector sum ifNil:[0 euro].

(assumes unit package with Euro support loaded).

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Collection>>sum implementation

Klaus D. Witzel
On Tue, 12 Aug 2008 09:28:22 +0200, Andreas Raab wrote:

> Klaus D. Witzel wrote:
>> Then use an object that for #+ returns a copy of the argument (no I'm  
>> not mentioning nil :)
>>   ^ myCollection inject: mysticalObject into: [:a :b | a + b]
>>  It *must* return some mysticalObject when myCollection is empty since  
>> one cannot know the species of elements which are not in a collection.
>
> nil is actually a fine "mysticalObject"

Sure that is :)

> and it arises naturally in the implementation of #sum that I would favor:
>
> Collection>>sum
>      "Answers the sum of the elements in the receiver, nil if empty"
>      | sum |
>      self do:[:value|
>          sum := sum ifNil:[value] ifNotNil:[sum + value].
>      ].
>      ^sum
>
>> But if you want #sum rather than #sumFromZero: then for the numerical  
>> case it sufficies to do
>>   mySalarySum := myBigSalaryVector sum + 0
>
> With the above this becomes:
>
>    mySalarySum := myBigSalaryVector sum ifNil:[0 euro].

I *love* that elegant use of the power of nil :)

[and I would object accessing inexisting elements for forcing an Error :( ]

big + 1, excellent implementation Andreas

> (assumes unit package with Euro support loaded).
>
> Cheers,
>    - Andreas
>


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Jason Johnson-5
In reply to this post by Andreas.Raab
On Tue, Aug 12, 2008 at 9:28 AM, Andreas Raab <[hidden email]> wrote:

> Klaus D. Witzel wrote:
>>
>> Then use an object that for #+ returns a copy of the argument (no I'm not
>> mentioning nil :)
>>
>>  ^ myCollection inject: mysticalObject into: [:a :b | a + b]
>>
>> It *must* return some mysticalObject when myCollection is empty since one
>> cannot know the species of elements which are not in a collection.
>
> nil is actually a fine "mysticalObject" and it arises naturally in the
> implementation of #sum that I would favor:
>
> Collection>>sum
>    "Answers the sum of the elements in the receiver, nil if empty"
>    | sum |
>    self do:[:value|
>        sum := sum ifNil:[value] ifNotNil:[sum + value].
>    ].
>    ^sum
>
>> But if you want #sum rather than #sumFromZero: then for the numerical case
>> it sufficies to do
>>
>>  mySalarySum := myBigSalaryVector sum + 0
>
> With the above this becomes:
>
>  mySalarySum := myBigSalaryVector sum ifNil:[0 euro].
>
> (assumes unit package with Euro support loaded).
>
> Cheers,
>  - Andreas

Good insight, but also in this implementation you are doing an if
every time through the loop for a condition that can only happen at
the start of the iteration.  In the current implementation and my
proposal the size check happens at the start only once as a
consequence of taking the first element.

If one wanted to handle the empty list case without using exceptions I
think the common pattern would be #ifEmpty/ifNone, no?

So something like:

mySalarySum := myBigSarayVector sumIfEmpty: [0 euro]            "hrm,
how are the selectors generally named for this sort of case?"

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Jason Johnson-5
In reply to this post by Klaus D. Witzel
On Tue, Aug 12, 2008 at 1:32 PM, Klaus D. Witzel <[hidden email]> wrote:
> On Tue, 12 Aug 2008 09:28:22 +0200, Andreas Raab wrote:
>
>> Klaus D. Witzel wrote:
>
> [and I would object accessing inexisting elements for forcing an Error :( ]

It's pretty consistent for the default case to signal an error and
provide an ifNone:/ifEmpty/etc. protocol for people who want other
behavior, no?  Of course unary messages participating in this protocol
do look funny. :)

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Collection>>sum implementation

Klaus D. Witzel
On Tue, 12 Aug 2008 15:10:57 +0200, Jason Johnson wrote:

> On Tue, Aug 12, 2008 at 1:32 PM, Klaus D. Witzel  
> <[hidden email]> wrote:
>> On Tue, 12 Aug 2008 09:28:22 +0200, Andreas Raab wrote:
>>
>>> Klaus D. Witzel wrote:
>>
>> [and I would object accessing inexisting elements for forcing an Error  
>> :( ]
>
> It's pretty consistent for the default case to signal an error

This looks perhaps good but the problem here is that #anyOne causes an  
indexing Error which n00bs and casual users will not immediately associate  
with the cause (and it is raised by a "remote" method which knows nothing  
about #sum). Users expect that Smalltalk's own collection protocols is  
resistant against indexing errors as usual, if you know what I mean ;)

Therefore I'm in favor of Andreas' #ifNil:, not only because of its  
elegance :)

And the essence of (x := y sum ifNil: [whatMyZeroIs]) is that the zero  
element is defined by the user, not the implementor of #sum ... (and  
#ifNil: can be omitted if you know your collections).

Regarding emptyCheck against #ifNil, your previous Q: during the loop, the  
former is a full send (with even more own sends) but the latter is  
inlined. For small collections one wouldn't notice a significant  
difference. And if collections are soo huuge (Smalltalk's huge ;) then  
people often implement/use the #fast* version of things.

> and
> provide an ifNone:/ifEmpty/etc. protocol for people who want other
> behavior, no?  Of course unary messages participating in this protocol
> do look funny. :)

Sure, alternatives are appreciated.


Reply | Threaded
Open this post in threaded view
|

RE: [squeak-dev] Re: Collection>>sum implementation

Ramon Leon-5
In reply to this post by Andreas.Raab
> nil is actually a fine "mysticalObject" and it arises
> naturally in the
> implementation of #sum that I would favor:
>
> Collection>>sum
>      "Answers the sum of the elements in the receiver, nil if empty"
>      | sum |
>      self do:[:value|
>          sum := sum ifNil:[value] ifNotNil:[sum + value].
>      ].
>      ^sum
>
> > But if you want #sum rather than #sumFromZero: then for the
> numerical
> > case it sufficies to do
> >
> >  mySalarySum := myBigSalaryVector sum + 0
>
> With the above this becomes:
>
>    mySalarySum := myBigSalaryVector sum ifNil:[0 euro].
>
> (assumes unit package with Euro support loaded).
>
> Cheers,
>    - Andreas

+1 on this, most idiomatic looking solution so far since using ifNil: is a
common idiom for defaulting a value.  It also properly handles collections
which might contain nil's by skipping over the nil while summing without
assuming any ordering by trying to use #first.

Ramon Leon


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Randal L. Schwartz
>>>>> "Ramon" == Ramon Leon <[hidden email]> writes:

Ramon> +1 on this, most idiomatic looking solution so far since using ifNil: is a
Ramon> common idiom for defaulting a value.  It also properly handles collections
Ramon> which might contain nil's by skipping over the nil while summing without
Ramon> assuming any ordering by trying to use #first.

Not really.  It actually makes nil a special positional value.

{ nil . 3 } sum => 3
{ 3 . nil } sum => "can't add 3 + nil"

I don't like it from that perspective.  Any/all leading nils would
be stripped, but other nils would be included.  But what does "leading"
mean in an UnorderedCollection?  Exactly.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Collection>>sum implementation

Andreas.Raab
Randal L. Schwartz wrote:
>>>>>> "Ramon" == Ramon Leon <[hidden email]> writes:
>
> Ramon> +1 on this, most idiomatic looking solution so far since using ifNil: is a
> Ramon> common idiom for defaulting a value.  It also properly handles collections
> Ramon> which might contain nil's by skipping over the nil while summing without
> Ramon> assuming any ordering by trying to use #first.
>
> Not really.  It actually makes nil a special positional value.

I think we'll have to agree that the result is undefined if the
collection contains elements that can't be summed up. While one can
rewrite the method to deal with the specific problem you are describing
like here:

Collection>>sum
    "Answer the sum of elements, or nil if empty"
    | first sum |
    first := true.
    self do:[:each|
      first
        ifTrue:[sum := each. first := false]
        ifFalse:[sum := sum + each]].
   ^sum

such that

   {nil. 3} sum. => Error
   {3. nil} sum. => Error

it just seems to me that the result of, e.g.,

  #() sum => nil
  {nil} sum => nil

is just as bad. I think that every proposed implementation has problems
similar to these and I don't think it's addressable without unreasonable
burden on both elements as well as implementation.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

RE: [squeak-dev] Re: Collection>>sum implementation

Ramon Leon-5
In reply to this post by Randal L. Schwartz
> Not really.  It actually makes nil a special positional value.
>
> { nil . 3 } sum => 3
> { 3 . nil } sum => "can't add 3 + nil"
>
> I don't like it from that perspective.  Any/all leading nils would
> be stripped, but other nils would be included.  But what does
> "leading"
> mean in an UnorderedCollection?  Exactly.

Ah, you're right, ordering shouldn't matter, this behaves as I expected...

sum
    "Answers the sum of the elements in the receiver, nil if empty"
    | sum |
    self do:
        [ :value |
        sum := sum
            ifNil: [ value ]
            ifNotNil:
                [ value
                    ifNil: [ sum ]
                    ifNotNil: [ sum + value ] ] ].
    ^ sum

Ramon Leon
http://onsmalltalk.com


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Jason Johnson-5
In reply to this post by Andreas.Raab
On Tue, Aug 12, 2008 at 8:02 PM, Andreas Raab <[hidden email]> wrote:

>
> I think we'll have to agree that the result is undefined if the collection
> contains elements that can't be summed up. While one can rewrite the method
> to deal with the specific problem you are describing like here:
>
> Collection>>sum
>   "Answer the sum of elements, or nil if empty"
>   | first sum |
>   first := true.
>   self do:[:each|
>     first
>       ifTrue:[sum := each. first := false]
>       ifFalse:[sum := sum + each]].
>  ^sum
>
> such that
>
>  {nil. 3} sum. => Error
>  {3. nil} sum. => Error
>
> it just seems to me that the result of, e.g.,
>
>  #() sum => nil
>  {nil} sum => nil
>
> is just as bad. I think that every proposed implementation has problems
> similar to these and I don't think it's addressable without unreasonable
> burden on both elements as well as implementation.
>
> Cheers,
>  - Andreas

>From my side, I just see #sum as convenience math utility (like
#average which is also in there).  I just wanted to see #reduce
included so this "add a sample then take it out again" could go away.
Purely for a teaching point of view, that is, I would rather new
programmers learn to refactor and build the language to their needs
instead of making strange work arounds to deal with "close but not
quite right" methods.