[squeak-dev] Collection>>sum implementation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Collection>>sum implementation

Jason Johnson-5
Hello all,

In my explorations today I happened on the implementation of the #sum method:

Collection>>sum
        "This is implemented using a variant of the normal inject:into: pattern.
        The reason for this is that it is not known whether we're in the normal
        number line, i.e. whether 0 is a good initial value for the sum.
        Consider a collection of measurement objects, 0 would be the unitless
        value and would not be appropriate to add with the unit-ed objects."
        | sum sample |
        sample _ self anyOne.
        sum _ self inject: sample into: [:accum :each | accum + each].
        ^ sum - sample


Personally, I view the code in the core image to, among other things,
serve the role of showing new Smalltalk programmers good ways of using
the language.  This is exactly the kind of thing I would *not* want
people to see.

The obvious issue here is that #inject:into: doesn't fit this case
very well.  But rather then go to these kinds of steps a much better
solution would be to simply notice that a specialized variation of
#inject:into: (that is in fact  quite common) is needed here.  Namely
a reduction method that just uses the first element as the starting
value, conceptually:

Collection>>reduce: aBlock
      ^ self allButFirst inject: self first into: aBlock

As it turns out there exists a more efficient implementation of this
method in Lukas' Magritte-Model package.  I would propose his method
be promoted to a core method so #sum could be rewritten as:

sum
     "sum the reciever"
     ^ self reduce: [:accum :each| accum + each ]

Thanks,
Jason

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Collection>>sum implementation

Paolo Bonzini-2

> As it turns out there exists a more efficient implementation of this
> method in Lukas' Magritte-Model package.  I would propose his method
> be promoted to a core method so #sum could be rewritten as:
>
> sum
>      "sum the reciever"
>      ^ self reduce: [:accum :each| accum + each ]

This method is called #fold: in VisualWorks, GNU Smalltalk and others.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Stéphane Rollandin
In reply to this post by Jason Johnson-5

>
> Collection>>reduce: aBlock
>       ^ self allButFirst inject: self first into: aBlock
>

#allButFirst copies the whole collection, where in the current #sum
implementation there is no such copy. we should avoid slowing down the
method.

Stef


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Collection>>sum implementation

Paolo Bonzini-2

>> Collection>>reduce: aBlock
>>       ^ self allButFirst inject: self first into: aBlock
>>
>
> #allButFirst copies the whole collection, where in the current #sum
> implementation there is no such copy. we should avoid slowing down the
> method.

That's just an idea, the actual implementation would be

   marker := value := Object new.
   self do: [ :each |
     value := marker == value ifTrue: [ each ]
       ifFalse: [ aBlock value: value value: each ] ]

with an optimized implementation for SequenceableCollection.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Jason Johnson-5
In reply to this post by Paolo Bonzini-2
On Mon, Aug 11, 2008 at 3:48 PM, Paolo Bonzini <[hidden email]> wrote:

>
>> As it turns out there exists a more efficient implementation of this
>> method in Lukas' Magritte-Model package.  I would propose his method
>> be promoted to a core method so #sum could be rewritten as:
>>
>> sum
>>     "sum the reciever"
>>     ^ self reduce: [:accum :each| accum + each ]
>
> This method is called #fold: in VisualWorks, GNU Smalltalk and others.
>
> Paolo

Ah ok.  It's called reduce in python and Lisp, and that's also what
Lukas named it in his package.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Jason Johnson-5
In reply to this post by Stéphane Rollandin
On Mon, Aug 11, 2008 at 5:16 PM, Stéphane Rollandin
<[hidden email]> wrote:

>
>>
>> Collection>>reduce: aBlock
>>      ^ self allButFirst inject: self first into: aBlock
>>
>
> #allButFirst copies the whole collection, where in the current #sum
> implementation there is no such copy. we should avoid slowing down the
> method.
>
> Stef

I know full well what allButFirst does.  That's why I said
*conceptually*.  I wanted to describe the idea in as concise a manner
as possible.  Please read the method *I actually suggested to be
included*.  No copy done there.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Tony Garnock-Jones-2
In reply to this post by Jason Johnson-5
Jason Johnson wrote:
> sum
>      "sum the reciever"
>      ^ self reduce: [:accum :each| accum + each ]

Yuck. (Not to the solution, necessarily, but to the whole idea of "sum".)

This is the kind of problem that can't be solved well without a
type-inferencing system (for instance, Haskell can implement polymorphic
sum over possibly-empty lists well, so long as the instance of the type
class of the elements of the collection has a definition for an
appropriate zero), or explicit guidance from the programmer.

sumWithZero: aZero
   ^ self inject: aZero into: [:accum :each | each + accum]

...and remove sum entirely. Note that sum doesn't work on empty
collections (!) but sumWithZero: does.


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Tony Garnock-Jones-2
Tony Garnock-Jones wrote:
> a definition for an
> appropriate zero

Another option is to make zero in Smalltalk inhabit multiple types! (In
a similar way to the way nil in Smalltalk inhabits multiple types) Then,

sum
   ^ self inject: 0 into: [:accum :each | accum + each]

...would work, so long as the elements treated 0 as a zero for #+.


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Collection>>sum implementation

Paolo Bonzini-2
In reply to this post by Jason Johnson-5
>> This method is called #fold: in VisualWorks, GNU Smalltalk and others.
>>
>> Paolo
>
> Ah ok.  It's called reduce in python and Lisp, and that's also what
> Lukas named it in his package.

I think #fold: is the Haskell name (they have both foldl and foldr), but
probably it comes from somewhere else.  Eliot should know...

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Stéphane Rollandin
In reply to this post by Jason Johnson-5
Jason Johnson a écrit :
> I know full well what allButFirst does.  That's why I said
> *conceptually*.  I wanted to describe the idea in as concise a manner
> as possible.  Please read the method *I actually suggested to be
> included*.  No copy done there.

wow, I did not mean any offense. just my 2 cents. sorry if it felt like
I was giving you a lesson or something...

Stef




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Relicensing signature (Collection>>sum implementation)

Yoshiki Ohshima-2
In reply to this post by Jason Johnson-5
  BTW, does anybody have close contact (i.e., working in the same
building, etc.) with Travis Griggs, who has the author initials of
these methods?  The record indicates that he hasn't turn the signature
in, and there are a few methods by him:

 'TAG'->methods by Travis Griggs
         Collection >> floor
         Collection >> average
         Collection >> sqrt
         Collection >> truncated
         Collection >> log
         Collection >> reciprocal
         Collection >> median
         Collection >> min
         Collection >> ceiling
         Collection >> negated
         Collection >> max
         Collection >> abs
         Collection >> range
         Collection >> rounded
         Collection >> squared
         Collection >> sum
         SequenceableCollection >> @

Yes, the current implementation of #sum isn't cool^^;

-- Yoshiki

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Eliot Miranda-2
In reply to this post by Paolo Bonzini-2


On Mon, Aug 11, 2008 at 9:21 AM, Paolo Bonzini <[hidden email]> wrote:
This method is called #fold: in VisualWorks, GNU Smalltalk and others.

Paolo

Ah ok.  It's called reduce in python and Lisp, and that's also what
Lukas named it in his package.

I think #fold: is the Haskell name (they have both foldl and foldr), but probably it comes from somewhere else.  Eliot should know...

I wish I knew for sure.  I think it is called fold in certain lisps too.  But I defer to Wikipedia. They prefer fold but also give reduceaccumulatecompress & inject (their emphasis) as other names.



Paolo




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Jason Johnson-5
In reply to this post by Paolo Bonzini-2
On Mon, Aug 11, 2008 at 6:21 PM, Paolo Bonzini <[hidden email]> wrote:

>>> This method is called #fold: in VisualWorks, GNU Smalltalk and others.
>>>
>>> Paolo
>>
>> Ah ok.  It's called reduce in python and Lisp, and that's also what
>> Lukas named it in his package.
>
> I think #fold: is the Haskell name (they have both foldl and foldr), but
> probably it comes from somewhere else.  Eliot should know...
>
> Paolo

I don't know if Haskell has what I'm calling the "reduce operator".
They have foldl and foldr, which are like #inject:into: but have a
slightly different behavior.  The easiest way to understand it is to
imagine a list; #(1 2 3) as:

1 : 2 : 3 : []

where [] is an empty list and : is an operator the appends the left
side to the right.  So foldl and foldr both replace the : with the
function they are passed and the [] with the initial value they are
passed.  The difference between the two is which way they then
resolve.  So if they did provide a "resolve" operation (i.e. take one
less value and use the first element of the list as the "initial
value") I would expect there to be two of them.

#inject:into: behaves like a right fold (reduction moves from left to
right) but with the [] on the front of the list instead of the end.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Jason Johnson-5
In reply to this post by Tony Garnock-Jones-2
On Mon, Aug 11, 2008 at 6:07 PM, Tony Garnock-Jones <[hidden email]> wrote:

> Jason Johnson wrote:
>>
>> sum
>>     "sum the reciever"
>>     ^ self reduce: [:accum :each| accum + each ]
>
> Yuck. (Not to the solution, necessarily, but to the whole idea of "sum".)
>
> This is the kind of problem that can't be solved well without a
> type-inferencing system (for instance, Haskell can implement polymorphic sum
> over possibly-empty lists well, so long as the instance of the type class of
> the elements of the collection has a definition for an appropriate zero), or
> explicit guidance from the programmer.
>
> sumWithZero: aZero
>  ^ self inject: aZero into: [:accum :each | each + accum]
>
> ...and remove sum entirely. Note that sum doesn't work on empty collections
> (!) but sumWithZero: does.

I think I understand what you're saying, but I disagree.  In a late
bound system we try it and if the types are not what our code expected
it fails.

In this case the only problem is the case of:        #() sum.
The problem is we don't know what type of objects the collection was
expected to have (and the collection could be hetrogenious anyway) so
there is no suitable "nothing" answer.  So I would propose that #sum
on an empty collection is simply an error (and I believe later in the
list it is said to actually do that).

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Tony Garnock-Jones-2
In reply to this post by Jason Johnson-5
Jason Johnson wrote:
> #inject:into: behaves like a right fold (reduction moves from left to
> right) but with the [] on the front of the list instead of the end.

It's actually behaving like a *left* fold. The leftness/rightness of a
fold is named after the left/right associativity that the combining
operator is given -- the difference between

   ((([] : 1) : 2) : 3)       -- a left fold, with (:) invoked
                              -- left-associatively

and

   (1 : (2 : (3 : [])))       -- a right fold, with (:) invoked
                              -- right-associatively

In haskell,

   foldr (:) [] =~= id

and

   foldl (flip (:)) [] === reverse

Regards,
   Tony


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Tony Garnock-Jones-2
In reply to this post by Jason Johnson-5
Jason Johnson wrote:
> So I would propose that #sum
> on an empty collection is simply an error (and I believe later in the
> list it is said to actually do that).

Yes, an error is currently signalled. Which makes the general case:

aCollection ifEmpty: [0] ifNotEmpty: [aCollection sum].

... which, to me, is a bad code smell. Then again, I've been an FPer for
a long while. The ifEmpty:ifNotEmpty: corresponds to a "reduce" in
SRFI-1 [1] terminology, rather than a proper "fold", and folds are
easier to use to write correct general code, in my experience.


Maybe sumWithZero: was a silly name -- sumFrom: is perhaps better? I
guess this is all bike-sheds, though, since #sum is firmly established,
awkwardness of type, of interface and of implementation notwithstanding.

sumFrom: x
   ^ self inject: x into: [:accum :each | accum + each]


Regards,
   Tony


[1]: http://srfi.schemers.org/srfi-1/srfi-1.html

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Collection>>sum implementation

Jason Johnson-5
In reply to this post by Tony Garnock-Jones-2
On Tue, Aug 12, 2008 at 11:21 AM, Tony Garnock-Jones <[hidden email]> wrote:

> Jason Johnson wrote:
>>
>> #inject:into: behaves like a right fold (reduction moves from left to
>> right) but with the [] on the front of the list instead of the end.
>
> It's actually behaving like a *left* fold. The leftness/rightness of a fold
> is named after the left/right associativity that the combining operator is
> given -- the difference between
>
>  ((([] : 1) : 2) : 3)       -- a left fold, with (:) invoked
>                             -- left-associatively
>
> and
>
>  (1 : (2 : (3 : [])))       -- a right fold, with (:) invoked
>                             -- right-associatively
>
> In haskell,
>
>  foldr (:) [] =~= id
>
> and
>
>  foldl (flip (:)) [] === reverse
>
> Regards,
>  Tony

Ah right, left is the one that can work on infinite sets (in Haskell
at least).  The diagram I looked at before I sent this message
appeared to have right fold mean the computation moves from left to
right.

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Colin Putney
In reply to this post by Tony Garnock-Jones-2

On 12-Aug-08, at 2:30 AM, Tony Garnock-Jones wrote:

> Yes, an error is currently signalled. Which makes the general case:
>
> aCollection ifEmpty: [0] ifNotEmpty: [aCollection sum].
>
> .. which, to me, is a bad code smell. Then again, I've been an FPer  
> for a long while.

It's a code smell from an OO perspective too.

This is  a bit like the #join: debate that comes up every now and then  
when a Ruby/Python/Perl ex-pat asks how it is that we don't have  a  
method for joining an array of strings into a single string. The short  
answer is that it's because #join: only makes sense when all the  
elements of a collection are strings, and therefore, it's bad design.

This is similar. I'd say the best way to improve #sum would be to  
delete it.

Ok, compatibility is an issue, breaking existing code would be bad  
etc. So leave it as is and point anyone who complains to  
#inject:into:. If that's really too verbose, then what's really needed  
is a more specific class that understands what this collection of  
numbers represents.

Colin

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] another example of #reduce:/#fold: (Re: Collection>>sum implementation)

Paolo Bonzini-2
> This is  a bit like the #join: debate that comes up every now and then
> when a Ruby/Python/Perl ex-pat asks how it is that we don't have  a
> method for joining an array of strings into a single string. The short
> answer is that it's because #join: only makes sense when all the
> elements of a collection are strings, and therefore, it's bad design.

I disagree.  Indeed, if you think of #join: as

   SequenceableCollection>>join: sep
      ^self fold: [ :a :b | a, sep, b ]

(well, right, if the collection is empty the behavior is different), it
becomes obvious to have an optimized implementation return an object of
class "self first species".

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Collection>>sum implementation

Tony Garnock-Jones-2
In reply to this post by Colin Putney
Colin Putney wrote:
> I'd say the best way to improve #sum would be to delete
> it.

Amen.


12