Smalltalk › Squeak › Squeak - Dev

[squeak-dev] Collection>>sum implementation

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

22 messages Options

Jason Johnson-5

[squeak-dev] Collection>>sum implementation

Hello all,

In my explorations today I happened on the implementation of the #sum method:

Collection>>sum
"This is implemented using a variant of the normal inject:into: pattern.
The reason for this is that it is not known whether we're in the normal
number line, i.e. whether 0 is a good initial value for the sum.
Consider a collection of measurement objects, 0 would be the unitless
value and would not be appropriate to add with the unit-ed objects."
| sum sample |
sample _ self anyOne.
sum _ self inject: sample into: [:accum :each | accum + each].
^ sum - sample

Personally, I view the code in the core image to, among other things,
serve the role of showing new Smalltalk programmers good ways of using
the language. This is exactly the kind of thing I would *not* want
people to see.

The obvious issue here is that #inject:into: doesn't fit this case
very well. But rather then go to these kinds of steps a much better
solution would be to simply notice that a specialized variation of
#inject:into: (that is in fact quite common) is needed here. Namely
a reduction method that just uses the first element as the starting
value, conceptually:

Collection>>reduce: aBlock
^ self allButFirst inject: self first into: aBlock

As it turns out there exists a more efficient implementation of this
method in Lukas' Magritte-Model package. I would propose his method
be promoted to a core method so #sum could be rewritten as:

sum
"sum the reciever"
^ self reduce: [:accum :each| accum + each ]

Thanks,
Jason

Paolo Bonzini-2

[squeak-dev] Re: Collection>>sum implementation

> As it turns out there exists a more efficient implementation of this
> method in Lukas' Magritte-Model package. I would propose his method
> be promoted to a core method so #sum could be rewritten as:
>
> sum
> "sum the reciever"
> ^ self reduce: [:accum :each| accum + each ]

This method is called #fold: in VisualWorks, GNU Smalltalk and others.

Paolo

Stéphane Rollandin

Re: [squeak-dev] Collection>>sum implementation

In reply to this post by Jason Johnson-5

>
> Collection>>reduce: aBlock
> ^ self allButFirst inject: self first into: aBlock
>

#allButFirst copies the whole collection, where in the current #sum
implementation there is no such copy. we should avoid slowing down the
method.

Stef

Paolo Bonzini-2

[squeak-dev] Re: Collection>>sum implementation

>> Collection>>reduce: aBlock
>> ^ self allButFirst inject: self first into: aBlock
>>
>
> #allButFirst copies the whole collection, where in the current #sum
> implementation there is no such copy. we should avoid slowing down the
> method.

That's just an idea, the actual implementation would be

marker := value := Object new.
self do: [ :each |
value := marker == value ifTrue: [ each ]
ifFalse: [ aBlock value: value value: each ] ]

with an optimized implementation for SequenceableCollection.

Paolo

Jason Johnson-5

Re: [squeak-dev] Re: Collection>>sum implementation

In reply to this post by Paolo Bonzini-2

On Mon, Aug 11, 2008 at 3:48 PM, Paolo Bonzini <[hidden email]> wrote:

>
>> As it turns out there exists a more efficient implementation of this
>> method in Lukas' Magritte-Model package. I would propose his method
>> be promoted to a core method so #sum could be rewritten as:
>>
>> sum
>> "sum the reciever"
>> ^ self reduce: [:accum :each| accum + each ]
>
> This method is called #fold: in VisualWorks, GNU Smalltalk and others.
>
> Paolo

Ah ok. It's called reduce in python and Lisp, and that's also what
Lukas named it in his package.

Jason Johnson-5

Re: [squeak-dev] Collection>>sum implementation

In reply to this post by Stéphane Rollandin

On Mon, Aug 11, 2008 at 5:16 PM, Stéphane Rollandin
<[hidden email]> wrote:

>
>>
>> Collection>>reduce: aBlock
>> ^ self allButFirst inject: self first into: aBlock
>>
>
> #allButFirst copies the whole collection, where in the current #sum
> implementation there is no such copy. we should avoid slowing down the
> method.
>
> Stef

I know full well what allButFirst does. That's why I said
*conceptually*. I wanted to describe the idea in as concise a manner
as possible. Please read the method *I actually suggested to be
included*. No copy done there.

Tony Garnock-Jones-2

Re: [squeak-dev] Collection>>sum implementation

In reply to this post by Jason Johnson-5

Jason Johnson wrote:
> sum
> "sum the reciever"
> ^ self reduce: [:accum :each| accum + each ]

Yuck. (Not to the solution, necessarily, but to the whole idea of "sum".)

This is the kind of problem that can't be solved well without a
type-inferencing system (for instance, Haskell can implement polymorphic
sum over possibly-empty lists well, so long as the instance of the type
class of the elements of the collection has a definition for an
appropriate zero), or explicit guidance from the programmer.

sumWithZero: aZero
^ self inject: aZero into: [:accum :each | each + accum]

...and remove sum entirely. Note that sum doesn't work on empty
collections (!) but sumWithZero: does.

Tony Garnock-Jones-2

Re: [squeak-dev] Collection>>sum implementation

Tony Garnock-Jones wrote:
> a definition for an
> appropriate zero

Another option is to make zero in Smalltalk inhabit multiple types! (In
a similar way to the way nil in Smalltalk inhabits multiple types) Then,

sum
^ self inject: 0 into: [:accum :each | accum + each]

...would work, so long as the elements treated 0 as a zero for #+.

Paolo Bonzini-2

[squeak-dev] Re: Collection>>sum implementation

In reply to this post by Jason Johnson-5

>> This method is called #fold: in VisualWorks, GNU Smalltalk and others.
>>
>> Paolo
>
> Ah ok. It's called reduce in python and Lisp, and that's also what
> Lukas named it in his package.

I think #fold: is the Haskell name (they have both foldl and foldr), but
probably it comes from somewhere else. Eliot should know...

Paolo

Stéphane Rollandin

Re: [squeak-dev] Collection>>sum implementation

In reply to this post by Jason Johnson-5

Jason Johnson a écrit :
> I know full well what allButFirst does. That's why I said
> *conceptually*. I wanted to describe the idea in as concise a manner
> as possible. Please read the method *I actually suggested to be
> included*. No copy done there.

wow, I did not mean any offense. just my 2 cents. sorry if it felt like
I was giving you a lesson or something...

Stef

Yoshiki Ohshima-2

Re: [squeak-dev] Relicensing signature (Collection>>sum implementation)

In reply to this post by Jason Johnson-5

BTW, does anybody have close contact (i.e., working in the same
building, etc.) with Travis Griggs, who has the author initials of
these methods? The record indicates that he hasn't turn the signature
in, and there are a few methods by him:

'TAG'->methods by Travis Griggs
Collection >> floor
Collection >> average
Collection >> sqrt
Collection >> truncated
Collection >> log
Collection >> reciprocal
Collection >> median
Collection >> min
Collection >> ceiling
Collection >> negated
Collection >> max
Collection >> abs
Collection >> range
Collection >> rounded
Collection >> squared
Collection >> sum
SequenceableCollection >> @

Yes, the current implementation of #sum isn't cool^^;

-- Yoshiki

Eliot Miranda-2

Re: [squeak-dev] Re: Collection>>sum implementation

In reply to this post by Paolo Bonzini-2

On Mon, Aug 11, 2008 at 9:21 AM, Paolo Bonzini <[hidden email]> wrote:

This method is called #fold: in VisualWorks, GNU Smalltalk and others.

Paolo

Ah ok. It's called reduce in python and Lisp, and that's also what
Lukas named it in his package.

I think #fold: is the Haskell name (they have both foldl and foldr), but probably it comes from somewhere else. Eliot should know...

I wish I knew for sure. I think it is called fold in certain lisps too. But I defer to Wikipedia. They prefer fold but also give reduce, accumulate, compress & inject (their emphasis) as other names.

Paolo

Jason Johnson-5

Re: [squeak-dev] Re: Collection>>sum implementation

In reply to this post by Paolo Bonzini-2

On Mon, Aug 11, 2008 at 6:21 PM, Paolo Bonzini <[hidden email]> wrote:

>>> This method is called #fold: in VisualWorks, GNU Smalltalk and others.
>>>
>>> Paolo
>>
>> Ah ok. It's called reduce in python and Lisp, and that's also what
>> Lukas named it in his package.
>
> I think #fold: is the Haskell name (they have both foldl and foldr), but
> probably it comes from somewhere else. Eliot should know...
>
> Paolo

I don't know if Haskell has what I'm calling the "reduce operator".
They have foldl and foldr, which are like #inject:into: but have a
slightly different behavior. The easiest way to understand it is to
imagine a list; #(1 2 3) as:

1 : 2 : 3 : []

where [] is an empty list and : is an operator the appends the left
side to the right. So foldl and foldr both replace the : with the
function they are passed and the [] with the initial value they are
passed. The difference between the two is which way they then
resolve. So if they did provide a "resolve" operation (i.e. take one
less value and use the first element of the list as the "initial
value") I would expect there to be two of them.

#inject:into: behaves like a right fold (reduction moves from left to
right) but with the [] on the front of the list instead of the end.

Jason Johnson-5

Re: [squeak-dev] Collection>>sum implementation

In reply to this post by Tony Garnock-Jones-2

On Mon, Aug 11, 2008 at 6:07 PM, Tony Garnock-Jones <[hidden email]> wrote:

> Jason Johnson wrote:
>>
>> sum
>> "sum the reciever"
>> ^ self reduce: [:accum :each| accum + each ]
>
> Yuck. (Not to the solution, necessarily, but to the whole idea of "sum".)
>
> This is the kind of problem that can't be solved well without a
> type-inferencing system (for instance, Haskell can implement polymorphic sum
> over possibly-empty lists well, so long as the instance of the type class of
> the elements of the collection has a definition for an appropriate zero), or
> explicit guidance from the programmer.
>
> sumWithZero: aZero
> ^ self inject: aZero into: [:accum :each | each + accum]
>
> ...and remove sum entirely. Note that sum doesn't work on empty collections
> (!) but sumWithZero: does.

I think I understand what you're saying, but I disagree. In a late
bound system we try it and if the types are not what our code expected
it fails.

In this case the only problem is the case of: #() sum.
The problem is we don't know what type of objects the collection was
expected to have (and the collection could be hetrogenious anyway) so
there is no suitable "nothing" answer. So I would propose that #sum
on an empty collection is simply an error (and I believe later in the
list it is said to actually do that).

Tony Garnock-Jones-2

Re: [squeak-dev] Re: Collection>>sum implementation

In reply to this post by Jason Johnson-5

Jason Johnson wrote:
> #inject:into: behaves like a right fold (reduction moves from left to
> right) but with the [] on the front of the list instead of the end.

It's actually behaving like a *left* fold. The leftness/rightness of a
fold is named after the left/right associativity that the combining
operator is given -- the difference between

((([] : 1) : 2) : 3) -- a left fold, with (:) invoked
-- left-associatively

and

(1 : (2 : (3 : []))) -- a right fold, with (:) invoked
-- right-associatively

In haskell,

foldr (:) [] =~= id

and

foldl (flip (:)) [] === reverse

Regards,
Tony

Tony Garnock-Jones-2

Re: [squeak-dev] Collection>>sum implementation

In reply to this post by Jason Johnson-5

Jason Johnson wrote:
> So I would propose that #sum
> on an empty collection is simply an error (and I believe later in the
> list it is said to actually do that).

Yes, an error is currently signalled. Which makes the general case:

aCollection ifEmpty: [0] ifNotEmpty: [aCollection sum].

... which, to me, is a bad code smell. Then again, I've been an FPer for
a long while. The ifEmpty:ifNotEmpty: corresponds to a "reduce" in
SRFI-1 [1] terminology, rather than a proper "fold", and folds are
easier to use to write correct general code, in my experience.

Maybe sumWithZero: was a silly name -- sumFrom: is perhaps better? I
guess this is all bike-sheds, though, since #sum is firmly established,
awkwardness of type, of interface and of implementation notwithstanding.

sumFrom: x
^ self inject: x into: [:accum :each | accum + each]

Regards,
Tony

[1]: http://srfi.schemers.org/srfi-1/srfi-1.html

Jason Johnson-5

Re: [squeak-dev] Re: Collection>>sum implementation

In reply to this post by Tony Garnock-Jones-2

On Tue, Aug 12, 2008 at 11:21 AM, Tony Garnock-Jones <[hidden email]> wrote:

> Jason Johnson wrote:
>>
>> #inject:into: behaves like a right fold (reduction moves from left to
>> right) but with the [] on the front of the list instead of the end.
>
> It's actually behaving like a *left* fold. The leftness/rightness of a fold
> is named after the left/right associativity that the combining operator is
> given -- the difference between
>
> ((([] : 1) : 2) : 3) -- a left fold, with (:) invoked
> -- left-associatively
>
> and
>
> (1 : (2 : (3 : []))) -- a right fold, with (:) invoked
> -- right-associatively
>
> In haskell,
>
> foldr (:) [] =~= id
>
> and
>
> foldl (flip (:)) [] === reverse
>
> Regards,
> Tony

Ah right, left is the one that can work on infinite sets (in Haskell
at least). The diagram I looked at before I sent this message
appeared to have right fold mean the computation moves from left to
right.

Colin Putney

Re: [squeak-dev] Collection>>sum implementation

In reply to this post by Tony Garnock-Jones-2

On 12-Aug-08, at 2:30 AM, Tony Garnock-Jones wrote:

> Yes, an error is currently signalled. Which makes the general case:
>
> aCollection ifEmpty: [0] ifNotEmpty: [aCollection sum].
>
> .. which, to me, is a bad code smell. Then again, I've been an FPer
> for a long while.

It's a code smell from an OO perspective too.

This is a bit like the #join: debate that comes up every now and then
when a Ruby/Python/Perl ex-pat asks how it is that we don't have a
method for joining an array of strings into a single string. The short
answer is that it's because #join: only makes sense when all the
elements of a collection are strings, and therefore, it's bad design.

This is similar. I'd say the best way to improve #sum would be to
delete it.

Ok, compatibility is an issue, breaking existing code would be bad
etc. So leave it as is and point anyone who complains to
#inject:into:. If that's really too verbose, then what's really needed
is a more specific class that understands what this collection of
numbers represents.

Colin

Paolo Bonzini-2

[squeak-dev] another example of #reduce:/#fold: (Re: Collection>>sum implementation)

> This is a bit like the #join: debate that comes up every now and then
> when a Ruby/Python/Perl ex-pat asks how it is that we don't have a
> method for joining an array of strings into a single string. The short
> answer is that it's because #join: only makes sense when all the
> elements of a collection are strings, and therefore, it's bad design.

I disagree. Indeed, if you think of #join: as

SequenceableCollection>>join: sep
^self fold: [ :a :b | a, sep, b ]

(well, right, if the collection is empty the behavior is different), it
becomes obvious to have an optimized implementation return an object of
class "self first species".

Paolo

Tony Garnock-Jones-2

Re: [squeak-dev] Collection>>sum implementation

In reply to this post by Colin Putney

Colin Putney wrote:
> I'd say the best way to improve #sum would be to delete
> it.

Amen.