Collection's #fold: vs #reduce:

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Collection's #fold: vs #reduce:

Levente Uzonyi-2
Hi,

we have these two methods which do exactly the same thing. #reduce: was
added by Andreas during the developement of Squeak 4.1. #fold was added by
Eliot for Cog VMMaker compatibility. One of them is superfluous. I can
image the following solutions:

1) Replace senders of #fold: in VMMaker to use #reduce:, and remove
#fold: from the image.

2) Replace #fold:'s implementation to self reduce: aBlock. I benchmarked
the two methods and #reduce: is a bit faster on CogVM. There's no
difference on SqueakVM.


Cheers,
Levente

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Eliot Miranda-2
Hi Levente,

On Tue, Nov 2, 2010 at 10:16 AM, Levente Uzonyi <[hidden email]> wrote:
Hi,

we have these two methods which do exactly the same thing. #reduce: was added by Andreas during the developement of Squeak 4.1. #fold was added by Eliot for Cog VMMaker compatibility. One of them is superfluous. I can image the following solutions:

1) Replace senders of #fold: in VMMaker to use #reduce:, and remove #fold: from the image.

2) Replace #fold:'s implementation to self reduce: aBlock. I benchmarked the two methods and #reduce: is a bit faster on CogVM. There's no difference on SqueakVM.

I noticed the same thing.  I prefer the fold: implementation so I'm bummed it's slightly slower ;)  Personally I like fold: as a name (it's shorter and more cuddly) and since we don't have map: (we have collect:) I don't find the need to use reduce: compelling.  But that's my preference.  I won't object if you replace fold: with reduce:/  I do note that Gilad used fold: in Newspeak.

What do you prefer?

What do others prefer?

I know, choices, choices :)

best
Eliot

 


Cheers,
Levente




Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Frank Shearar
On 2010/11/02 19:45, Eliot Miranda wrote:

> Hi Levente,
>
> On Tue, Nov 2, 2010 at 10:16 AM, Levente Uzonyi <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     we have these two methods which do exactly the same thing. #reduce:
>     was added by Andreas during the developement of Squeak 4.1. #fold
>     was added by Eliot for Cog VMMaker compatibility. One of them is
>     superfluous. I can image the following solutions:
>
>     1) Replace senders of #fold: in VMMaker to use #reduce:, and remove
>     #fold: from the image.
>
>     2) Replace #fold:'s implementation to self reduce: aBlock. I
>     benchmarked the two methods and #reduce: is a bit faster on CogVM.
>     There's no difference on SqueakVM.
>
>
> I noticed the same thing.  I prefer the fold: implementation so I'm
> bummed it's slightly slower ;)  Personally I like fold: as a name (it's
> shorter and more cuddly) and since we don't have map: (we have collect:)
> I don't find the need to use reduce: compelling.  But that's my
> preference.  I won't object if you replace fold: with reduce:/  I do
> note that Gilad used fold: in Newspeak.
>
> What do you prefer?
>
> What do others prefer?
>
> I know, choices, choices :)

I like both, and both are well-entrenched terms in the FP world (fold in
Haskell, and reduce in Common Lisp, for instance).

One extra datum is that there's already a #reduce (Number>>reduce and
Float>>reduce) so perhaps #fold: might be a better name just because it
clashes with nothing in Trunk.

frank

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Levente Uzonyi-2
In reply to this post by Eliot Miranda-2
On Tue, 2 Nov 2010, Eliot Miranda wrote:

> Hi Levente,
>
> On Tue, Nov 2, 2010 at 10:16 AM, Levente Uzonyi <[hidden email]> wrote:
>
>> Hi,
>>
>> we have these two methods which do exactly the same thing. #reduce: was
>> added by Andreas during the developement of Squeak 4.1. #fold was added by
>> Eliot for Cog VMMaker compatibility. One of them is superfluous. I can image
>> the following solutions:
>>
>> 1) Replace senders of #fold: in VMMaker to use #reduce:, and remove #fold:
>> from the image.
>>
>> 2) Replace #fold:'s implementation to self reduce: aBlock. I benchmarked
>> the two methods and #reduce: is a bit faster on CogVM. There's no difference
>> on SqueakVM.
>>
>
> I noticed the same thing.  I prefer the fold: implementation so I'm bummed
> it's slightly slower ;)  Personally I like fold: as a name (it's shorter and

The difference is about 1%, but it's reproducible. And yes, the
implementation of #fold: is a bit nicer. :)

> more cuddly) and since we don't have map: (we have collect:) I don't find
> the need to use reduce: compelling.  But that's my preference.  I won't
> object if you replace fold: with reduce:/  I do note that Gilad used fold:
> in Newspeak.
>
> What do you prefer?

I prefer the second option, because both names are widely used, but
some people are only aware of the one which their previously used
languages have. So this way we can avoid questions like "Why isn't there a
method for folding in Squeak?".


Levente

>
> What do others prefer?
>
> I know, choices, choices :)
>
> best
> Eliot
>
>
>
>>
>>
>> Cheers,
>> Levente
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Eliot Miranda-2


On Tue, Nov 2, 2010 at 11:31 AM, Levente Uzonyi <[hidden email]> wrote:
On Tue, 2 Nov 2010, Eliot Miranda wrote:

Hi Levente,

On Tue, Nov 2, 2010 at 10:16 AM, Levente Uzonyi <[hidden email]> wrote:

Hi,

we have these two methods which do exactly the same thing. #reduce: was
added by Andreas during the developement of Squeak 4.1. #fold was added by
Eliot for Cog VMMaker compatibility. One of them is superfluous. I can image
the following solutions:

1) Replace senders of #fold: in VMMaker to use #reduce:, and remove #fold:
from the image.

2) Replace #fold:'s implementation to self reduce: aBlock. I benchmarked
the two methods and #reduce: is a bit faster on CogVM. There's no difference
on SqueakVM.


I noticed the same thing.  I prefer the fold: implementation so I'm bummed
it's slightly slower ;)  Personally I like fold: as a name (it's shorter and

The difference is about 1%, but it's reproducible. And yes, the implementation of #fold: is a bit nicer. :)

The difference is due to the instantiation "Object new", and instantiation is slow on the current Cog because I haven't implemented the instantiation primitives in machine code, and this limitation is hopefully temporary.  So you might leave it be.  However, changing the code to use {} for teh instantiation speeds things up markedly:

Collection>>newFold: binaryBlock
| firstValue nextValue |
firstValue := nextValue := {}.
self do:
[:each |
nextValue := firstValue == nextValue
ifTrue: [each]
ifFalse: [binaryBlock value: nextValue value: each]].
^nextValue == firstValue
ifTrue: [self errorEmptyCollection]
ifFalse: [nextValue]

| c r |
c := #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me').
{ Time millisecondsToRun: [1 to: 1000000 do: [:i| c fold: [:a :b | a, ' ', b]]].
Time millisecondsToRun: [1 to: 1000000 do: [:i| c newFold: [:a :b | a, ' ', b]]].
Time millisecondsToRun: [1 to: 1000000 do: [:i| c reduce: [:a :b | a, ' ', b]]] }
#(5076 5008 5052)


| c |
c := #('if').
{ Time millisecondsToRun: [1 to: 1000000 do: [:i| c fold: [:a :b | a, ' ', b]]].
   Time millisecondsToRun: [1 to: 1000000 do: [:i| c newFold: [:a :b | a, ' ', b]]].
   Time millisecondsToRun: [1 to: 1000000 do: [:i| c reduce: [:a :b | a, ' ', b]]] }
 #(247 226 220)


(N.B.  the above are rough measurements!!).

So how about changing fold: to replace Object new with {} and keep it?


Eliot



more cuddly) and since we don't have map: (we have collect:) I don't find
the need to use reduce: compelling.  But that's my preference.  I won't
object if you replace fold: with reduce:/  I do note that Gilad used fold:
in Newspeak.

What do you prefer?

I prefer the second option, because both names are widely used, but some people are only aware of the one which their previously used languages have. So this way we can avoid questions like "Why isn't there a
method for folding in Squeak?".


Levente



What do others prefer?

I know, choices, choices :)

best
Eliot





Cheers,
Levente







Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Nicolas Cellier
In reply to this post by Levente Uzonyi-2
2010/11/2 Levente Uzonyi <[hidden email]>:

>> What do you prefer?
>
> I prefer the second option, because both names are widely used, but some
> people are only aware of the one which their previously used languages have.
> So this way we can avoid questions like "Why isn't there a
> method for folding in Squeak?".
>
>
> Levente
>

I'd say which selector are implemented in Pharo and Cuis ?
We should better converge.

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Andreas.Raab
In reply to this post by Levente Uzonyi-2
On 11/2/2010 11:31 AM, Levente Uzonyi wrote:

>> more cuddly) and since we don't have map: (we have collect:) I don't find
>> the need to use reduce: compelling. But that's my preference. I won't
>> object if you replace fold: with reduce:/ I do note that Gilad used fold:
>> in Newspeak.
>>
>> What do you prefer?
>
> I prefer the second option, because both names are widely used, but some
> people are only aware of the one which their previously used languages
> have. So this way we can avoid questions like "Why isn't there a
> method for folding in Squeak?".

If you go by familiarity, reduce is by far the obvious choice. Most of
the popular dynamic languages use reduce:

PHP: http://php.net/manual/en/function.array-reduce.php
Javascript:
https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/Reduce
Python: http://docs.python.org/library/functions.html#reduce

What languages use fold?

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Eliot Miranda-2
In reply to this post by Eliot Miranda-2


On Tue, Nov 2, 2010 at 11:50 AM, Eliot Miranda <[hidden email]> wrote:


On Tue, Nov 2, 2010 at 11:31 AM, Levente Uzonyi <[hidden email]> wrote:
On Tue, 2 Nov 2010, Eliot Miranda wrote:

Hi Levente,

On Tue, Nov 2, 2010 at 10:16 AM, Levente Uzonyi <[hidden email]> wrote:

Hi,

we have these two methods which do exactly the same thing. #reduce: was
added by Andreas during the developement of Squeak 4.1. #fold was added by
Eliot for Cog VMMaker compatibility. One of them is superfluous. I can image
the following solutions:

1) Replace senders of #fold: in VMMaker to use #reduce:, and remove #fold:
from the image.

2) Replace #fold:'s implementation to self reduce: aBlock. I benchmarked
the two methods and #reduce: is a bit faster on CogVM. There's no difference
on SqueakVM.


I noticed the same thing.  I prefer the fold: implementation so I'm bummed
it's slightly slower ;)  Personally I like fold: as a name (it's shorter and

The difference is about 1%, but it's reproducible. And yes, the implementation of #fold: is a bit nicer. :)

The difference is due to the instantiation "Object new", and instantiation is slow on the current Cog because I haven't implemented the instantiation primitives in machine code, and this limitation is hopefully temporary.  So you might leave it be.  However, changing the code to use {} for teh instantiation speeds things up markedly:

Collection>>newFold: binaryBlock
| firstValue nextValue |
firstValue := nextValue := {}.
self do:
[:each |
nextValue := firstValue == nextValue
ifTrue: [each]
ifFalse: [binaryBlock value: nextValue value: each]].
^nextValue == firstValue
ifTrue: [self errorEmptyCollection]
ifFalse: [nextValue]

| c r |
c := #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me').
{ Time millisecondsToRun: [1 to: 1000000 do: [:i| c fold: [:a :b | a, ' ', b]]].
Time millisecondsToRun: [1 to: 1000000 do: [:i| c newFold: [:a :b | a, ' ', b]]].
Time millisecondsToRun: [1 to: 1000000 do: [:i| c reduce: [:a :b | a, ' ', b]]] }
#(5076 5008 5052)


| c |
c := #('if').
{ Time millisecondsToRun: [1 to: 1000000 do: [:i| c fold: [:a :b | a, ' ', b]]].
   Time millisecondsToRun: [1 to: 1000000 do: [:i| c newFold: [:a :b | a, ' ', b]]].
   Time millisecondsToRun: [1 to: 1000000 do: [:i| c reduce: [:a :b | a, ' ', b]]] }
 #(247 226 220)


(N.B.  the above are rough measurements!!).

So how about changing fold: to replace Object new with {} and keep it?

And of course one can consider using thisContext instead of Object new or {}, since thisContext has to be instantiated anyway for the closure in fold:, and so results in less memory pressure.  This needs careful measurement; you have the test bed set up for that; my workspace hacks are showing too much variability to distinguish.

This is too ugly to contemplate, but at first blush it seems faster:

fold: binaryBlock
"Evaluate the block with the first two elements of the receiver,
then with the result of the first evaluation and the next element,
and so on.  Answer the result of the final evaluation. If the receiver
is empty, raise an error. If the receiver has a single element, answer
that element."
"#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a :b | a, ' ', b]"

| firstValue nextValue |
firstValue := nextValue := "something that can't be in the receiver"
[:each |
nextValue := firstValue == nextValue
ifTrue: [each]
ifFalse: [binaryBlock value: nextValue value: each]].
self do: firstValue.
^nextValue == firstValue
ifTrue: [self errorEmptyCollection]
ifFalse: [nextValue]

 


Eliot



more cuddly) and since we don't have map: (we have collect:) I don't find
the need to use reduce: compelling.  But that's my preference.  I won't
object if you replace fold: with reduce:/  I do note that Gilad used fold:
in Newspeak.

What do you prefer?

I prefer the second option, because both names are widely used, but some people are only aware of the one which their previously used languages have. So this way we can avoid questions like "Why isn't there a
method for folding in Squeak?".


Levente



What do others prefer?

I know, choices, choices :)

best
Eliot





Cheers,
Levente








Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Eliot Miranda-2


On Tue, Nov 2, 2010 at 12:25 PM, Eliot Miranda <[hidden email]> wrote:


On Tue, Nov 2, 2010 at 11:50 AM, Eliot Miranda <[hidden email]> wrote:


On Tue, Nov 2, 2010 at 11:31 AM, Levente Uzonyi <[hidden email]> wrote:
On Tue, 2 Nov 2010, Eliot Miranda wrote:

Hi Levente,

On Tue, Nov 2, 2010 at 10:16 AM, Levente Uzonyi <[hidden email]> wrote:

Hi,

we have these two methods which do exactly the same thing. #reduce: was
added by Andreas during the developement of Squeak 4.1. #fold was added by
Eliot for Cog VMMaker compatibility. One of them is superfluous. I can image
the following solutions:

1) Replace senders of #fold: in VMMaker to use #reduce:, and remove #fold:
from the image.

2) Replace #fold:'s implementation to self reduce: aBlock. I benchmarked
the two methods and #reduce: is a bit faster on CogVM. There's no difference
on SqueakVM.


I noticed the same thing.  I prefer the fold: implementation so I'm bummed
it's slightly slower ;)  Personally I like fold: as a name (it's shorter and

The difference is about 1%, but it's reproducible. And yes, the implementation of #fold: is a bit nicer. :)

The difference is due to the instantiation "Object new", and instantiation is slow on the current Cog because I haven't implemented the instantiation primitives in machine code, and this limitation is hopefully temporary.  So you might leave it be.  However, changing the code to use {} for teh instantiation speeds things up markedly:

Collection>>newFold: binaryBlock
| firstValue nextValue |
firstValue := nextValue := {}.
self do:
[:each |
nextValue := firstValue == nextValue
ifTrue: [each]
ifFalse: [binaryBlock value: nextValue value: each]].
^nextValue == firstValue
ifTrue: [self errorEmptyCollection]
ifFalse: [nextValue]

| c r |
c := #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me').
{ Time millisecondsToRun: [1 to: 1000000 do: [:i| c fold: [:a :b | a, ' ', b]]].
Time millisecondsToRun: [1 to: 1000000 do: [:i| c newFold: [:a :b | a, ' ', b]]].
Time millisecondsToRun: [1 to: 1000000 do: [:i| c reduce: [:a :b | a, ' ', b]]] }
#(5076 5008 5052)


| c |
c := #('if').
{ Time millisecondsToRun: [1 to: 1000000 do: [:i| c fold: [:a :b | a, ' ', b]]].
   Time millisecondsToRun: [1 to: 1000000 do: [:i| c newFold: [:a :b | a, ' ', b]]].
   Time millisecondsToRun: [1 to: 1000000 do: [:i| c reduce: [:a :b | a, ' ', b]]] }
 #(247 226 220)


(N.B.  the above are rough measurements!!).

So how about changing fold: to replace Object new with {} and keep it?

And of course one can consider using thisContext instead of Object new or {}, since thisContext has to be instantiated anyway for the closure in fold:, and so results in less memory pressure.  This needs careful measurement; you have the test bed set up for that; my workspace hacks are showing too much variability to distinguish.

This is too ugly to contemplate, but at first blush it seems faster:

fold: binaryBlock
"Evaluate the block with the first two elements of the receiver,
then with the result of the first evaluation and the next element,
and so on.  Answer the result of the final evaluation. If the receiver
is empty, raise an error. If the receiver has a single element, answer
that element."
"#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a :b | a, ' ', b]"

| firstValue nextValue |
firstValue := nextValue := "something that can't be in the receiver"
[:each |
nextValue := firstValue == nextValue
ifTrue: [each]
ifFalse: [binaryBlock value: nextValue value: each]].
self do: firstValue.
^nextValue == firstValue
ifTrue: [self errorEmptyCollection]
ifFalse: [nextValue]


and if you're really whoorish about speed this is even faster (because the Cog code generator is so naive)

fold: binaryBlock
"Evaluate the block with the first two elements of the receiver,
then with the result of the first evaluation and the next element,
and so on.  Answer the result of the final evaluation. If the receiver
is empty, raise an error. If the receiver has a single element, answer
that element."
"#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a :b | a, ' ', b]"

| firstValue nextValue |
self do: (firstValue := nextValue := "something that can't be in the receiver"
[:each |
nextValue := firstValue == nextValue
ifTrue: [each]
ifFalse: [binaryBlock value: nextValue value: each]]).
^nextValue == firstValue
ifTrue: [self errorEmptyCollection]
ifFalse: [nextValue] 
 


Eliot



more cuddly) and since we don't have map: (we have collect:) I don't find
the need to use reduce: compelling.  But that's my preference.  I won't
object if you replace fold: with reduce:/  I do note that Gilad used fold:
in Newspeak.

What do you prefer?

I prefer the second option, because both names are widely used, but some people are only aware of the one which their previously used languages have. So this way we can avoid questions like "Why isn't there a
method for folding in Squeak?".


Levente



What do others prefer?

I know, choices, choices :)

best
Eliot





Cheers,
Levente









Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Andreas.Raab
In reply to this post by Eliot Miranda-2
On 11/2/2010 12:25 PM, Eliot Miranda wrote:
> And of course one can consider using thisContext instead of Object new
> or {}, since thisContext has to be instantiated anyway for the closure
> in fold:, and so results in less memory pressure.  This needs careful
> measurement; you have the test bed set up for that; my workspace hacks
> are showing too much variability to distinguish.

I'm sorry but if your application is limited by the performance of
reduce/fold you're doing it wrong. Can we discuss *clarity* of the
solution over performance please? If we'd really need a high-performance
variant, then *nothing* beats a trivial custom implementation in Array
along the lines of:

Array>>reduce: aBlock

   result := self at: 1.
   2 to: self size do:[:i| result := aBlock value: result value: (self
at: 1)].

-- "Premature optimization is the root of all evil" --

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Eliot Miranda-2


On Tue, Nov 2, 2010 at 12:37 PM, Andreas Raab <[hidden email]> wrote:
On 11/2/2010 12:25 PM, Eliot Miranda wrote:
And of course one can consider using thisContext instead of Object new
or {}, since thisContext has to be instantiated anyway for the closure
in fold:, and so results in less memory pressure.  This needs careful
measurement; you have the test bed set up for that; my workspace hacks
are showing too much variability to distinguish.

I'm sorry but if your application is limited by the performance of reduce/fold you're doing it wrong. Can we discuss *clarity* of the solution over performance please? If we'd really need a high-performance variant, then *nothing* beats a trivial custom implementation in Array along the lines of:

Array>>reduce: aBlock

 result := self at: 1.
 2 to: self size do:[:i| result := aBlock value: result value: (self at: 1)].

-- "Premature optimization is the root of all evil" --

I know.  I'm joking ;)  It's still interesting to see how code generator fallibilities make measurable differences.  I did say I thought the whoorish speed differences were too ugly to contemplate.

best
Eliot
 

Cheers,
 - Andreas




Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Levente Uzonyi-2
In reply to this post by Andreas.Raab
On Tue, 2 Nov 2010, Andreas Raab wrote:

> On 11/2/2010 11:31 AM, Levente Uzonyi wrote:
>>> more cuddly) and since we don't have map: (we have collect:) I don't find
>>> the need to use reduce: compelling. But that's my preference. I won't
>>> object if you replace fold: with reduce:/ I do note that Gilad used fold:
>>> in Newspeak.
>>>
>>> What do you prefer?
>>
>> I prefer the second option, because both names are widely used, but some
>> people are only aware of the one which their previously used languages
>> have. So this way we can avoid questions like "Why isn't there a
>> method for folding in Squeak?".
>
> If you go by familiarity, reduce is by far the obvious choice. Most of the
> popular dynamic languages use reduce:
>
> PHP: http://php.net/manual/en/function.array-reduce.php
> Javascript:
> https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/Reduce
> Python: http://docs.python.org/library/functions.html#reduce
>
> What languages use fold?

Mostly functional languages. Here is a list of popular languages, that
implement fold/reduce:
http://en.wikipedia.org/wiki/Fold_(higher-order_function)#Implementation

So users of Erlang, F#, Haskell, OCaml, Scala, SML will probably look for
#fold:.


Levente

>
> Cheers,
>  - Andreas
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Juan Vuletich-4
In reply to this post by Nicolas Cellier
Nicolas Cellier wrote:
>
> I'd say which selector are implemented in Pharo and Cuis ?
> We should better converge.
>
> Nicolas
>  

Hi folks,

Cuis doesn't include any of them yet. I can add whatever people prefer.
What I'd do different is the implementation:

fold: aBinaryBlock
    "Evaluate the block with the first two elements of the receiver,
     then with the result of the first evaluation and the next element,
     and so on.  Answer the result of the final evaluation. If the receiver
     is empty, raise an error. If the receiver has a single element, answer
     that element."
    "
    #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a :b |
a, ' ', b]
    "
    | noPreviousValue |
    noPreviousValue := Object new.    "something that can't be in the
receiver"
    ^self inject: noPreviousValue into: [ :previousValue :each |
        previousValue == noPreviousValue
            ifTrue: [ each ]
            ifFalse: [ aBinaryBlock value: previousValue value: each ]]

This is easier to understand, and it also makes clear the relation
between #fold: (or #reduce:) and #inject:into: .

Cheers,
Juan Vuletich

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Eliot Miranda-2


On Tue, Nov 2, 2010 at 1:23 PM, Juan Vuletich <[hidden email]> wrote:
Nicolas Cellier wrote:

I'd say which selector are implemented in Pharo and Cuis ?
We should better converge.

Nicolas
 

Hi folks,

Cuis doesn't include any of them yet. I can add whatever people prefer. What I'd do different is the implementation:

fold: aBinaryBlock

  "Evaluate the block with the first two elements of the receiver,
   then with the result of the first evaluation and the next element,
   and so on.  Answer the result of the final evaluation. If the receiver
   is empty, raise an error. If the receiver has a single element, answer
   that element."
  "
  #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a :b | a, ' ', b]
  "
  | noPreviousValue |
  noPreviousValue := Object new.    "something that can't be in the receiver"
  ^self inject: noPreviousValue into: [ :previousValue :each |
      previousValue == noPreviousValue
          ifTrue: [ each ]
          ifFalse: [ aBinaryBlock value: previousValue value: each ]]

This is easier to understand, and it also makes clear the relation between #fold: (or #reduce:) and #inject:into: .

I disagree.   inject:into: is not particularly easy to understand, whereas both the existing fold: and reduce: are understandable in terms of do:. Also the use of inject:into: isn't really buying you anything since the logic in the block within the inject:into: usage is as complex as that within fold: or reduce:. Further, this /is/ a lot slower because of the extra level of evaluation (there's an extra block activation around each element).
 

best,
Eliot


Cheers,
Juan Vuletich




Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Juan Vuletich-4
Eliot Miranda wrote:

>
>
> On Tue, Nov 2, 2010 at 1:23 PM, Juan Vuletich <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Nicolas Cellier wrote:
>
>
>         I'd say which selector are implemented in Pharo and Cuis ?
>         We should better converge.
>
>         Nicolas
>          
>
>
>     Hi folks,
>
>     Cuis doesn't include any of them yet. I can add whatever people
>     prefer. What I'd do different is the implementation:
>
>     fold: aBinaryBlock
>
>       "Evaluate the block with the first two elements of the receiver,
>        then with the result of the first evaluation and the next element,
>        and so on.  Answer the result of the final evaluation. If the
>     receiver
>        is empty, raise an error. If the receiver has a single element,
>     answer
>        that element."
>       "
>       #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a
>     :b | a, ' ', b]
>       "
>       | noPreviousValue |
>       noPreviousValue := Object new.    "something that can't be in
>     the receiver"
>       ^self inject: noPreviousValue into: [ :previousValue :each |
>           previousValue == noPreviousValue
>               ifTrue: [ each ]
>               ifFalse: [ aBinaryBlock value: previousValue value: each ]]
>
>     This is easier to understand, and it also makes clear the relation
>     between #fold: (or #reduce:) and #inject:into: .
>
>
> I disagree.   inject:into: is not particularly easy to understand,
> whereas both the existing fold: and reduce: are understandable in
> terms of do:.

I say this is easier to understand, given that #inject:into: is already
understood. #inject:into: and #fold: / #reduce: are very close in what
they do. So close, that it is best to show how they differ. That's what
my implementation does. Showing the use of #inject:into:  (with proper
names for block args) is a didactic bonus.

> Also the use of inject:into: isn't really buying you anything since
> the logic in the block within the inject:into: usage is as complex as
> that within fold: or reduce:. Further, this /is/ a lot slower because
> of the extra level of evaluation (there's an extra block activation
> around each element).
>  
>
> best,
> Eliot

No, it is not slower:
    [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to'
'me') fold: [:a :b | a, ' ', b]]] timeToRun 2710 | 879
    [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to'
'me') foldx: [:a :b | a, ' ', b]]] timeToRun 2723 | 874
    [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to'
'me') reduce: [:a :b | a, ' ', b]]] timeToRun 2780 | 881

#foldx: is my implementation. In all cases, the first number is regular
interpreter Squeak 4.2.4beta1U.app, the second number is Cog Squeak
5.8b10-1.app on a 1.6GHz Mac mini.

The argument about complexity is more aesthetic than anything. My
version doesn't need assignment to outer temps, and I see that 'cleaner'
and closer to FP.

Cheers,
Juan Vuletich


Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Eliot Miranda-2


On Tue, Nov 2, 2010 at 1:50 PM, Juan Vuletich <[hidden email]> wrote:
Eliot Miranda wrote:



On Tue, Nov 2, 2010 at 1:23 PM, Juan Vuletich <[hidden email] <mailto:[hidden email]>> wrote:

   Nicolas Cellier wrote:


       I'd say which selector are implemented in Pharo and Cuis ?
       We should better converge.

       Nicolas
       

   Hi folks,

   Cuis doesn't include any of them yet. I can add whatever people
   prefer. What I'd do different is the implementation:

   fold: aBinaryBlock

     "Evaluate the block with the first two elements of the receiver,
      then with the result of the first evaluation and the next element,
      and so on.  Answer the result of the final evaluation. If the
   receiver
      is empty, raise an error. If the receiver has a single element,
   answer
      that element."
     "
     #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a
   :b | a, ' ', b]
     "
     | noPreviousValue |
     noPreviousValue := Object new.    "something that can't be in
   the receiver"
     ^self inject: noPreviousValue into: [ :previousValue :each |
         previousValue == noPreviousValue
             ifTrue: [ each ]
             ifFalse: [ aBinaryBlock value: previousValue value: each ]]

   This is easier to understand, and it also makes clear the relation
   between #fold: (or #reduce:) and #inject:into: .


I disagree.   inject:into: is not particularly easy to understand, whereas both the existing fold: and reduce: are understandable in terms of do:.

I say this is easier to understand, given that #inject:into: is already understood. #inject:into: and #fold: / #reduce: are very close in what they do. So close, that it is best to show how they differ. That's what my implementation does. Showing the use of #inject:into:  (with proper names for block args) is a didactic bonus.


Also the use of inject:into: isn't really buying you anything since the logic in the block within the inject:into: usage is as complex as that within fold: or reduce:. Further, this /is/ a lot slower because of the extra level of evaluation (there's an extra block activation around each element).
 
best,
Eliot

No, it is not slower:
  [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a :b | a, ' ', b]]] timeToRun 2710 | 879
  [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') foldx: [:a :b | a, ' ', b]]] timeToRun 2723 | 874
  [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') reduce: [:a :b | a, ' ', b]]] timeToRun 2780 | 881

Be very careful to run these under the same conditions.  You'd need to run each of these immediately following a GC to get meaningful results.  Also use 1 to: n do for the outer loop since timesRepeat: uses do: and so the measurement harness is probably a significant fraction of the run-time.  Also measure the single-element cost.  BTW, I would use (1 to: 10) fold: [:a :b| a + b] to reduce the overhead of string concatenation & allocation and focus more on the costs of the various fold: implementations.



#foldx: is my implementation. In all cases, the first number is regular interpreter Squeak 4.2.4beta1U.app, the second number is Cog Squeak 5.8b10-1.app on a 1.6GHz Mac mini.

The argument about complexity is more aesthetic than anything.

Indeed; and so a matter of taste.  My aesthetics end up preferring fold: over the inject:into: based one :/

 
My version doesn't need assignment to outer temps, and I see that 'cleaner' and closer to FP.


But the assignment to the outer temp is hidden in inject:into:; it's still there ;)

Cheers,
Juan Vuletich





Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Igor Stasenko
In reply to this post by Juan Vuletich-4
.snip.

>
> fold: aBinaryBlock
>   "Evaluate the block with the first two elements of the receiver,
>    then with the result of the first evaluation and the next element,
>    and so on.  Answer the result of the final evaluation. If the receiver
>    is empty, raise an error. If the receiver has a single element, answer
>    that element."
>   "
>   #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a :b | a, '
> ', b]

just a 0.5 penny..

wow.. you trying to get a fastest way to fold a string,
by using a concatenaiton.. i think its not most effective way, at
least for given example :)

I think that using streams, it will be much faster.

(String streamContents: [:s |
  (#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') do: [:each | s
nextPutAll: each; nextPut: $  ]) allButLast



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Levente Uzonyi-2
On Tue, 2 Nov 2010, Igor Stasenko wrote:

> .snip.
>>
>> fold: aBinaryBlock
>>   "Evaluate the block with the first two elements of the receiver,
>>    then with the result of the first evaluation and the next element,
>>    and so on.  Answer the result of the final evaluation. If the receiver
>>    is empty, raise an error. If the receiver has a single element, answer
>>    that element."
>>   "
>>   #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') fold: [:a :b | a, '
>> ', b]
>
> just a 0.5 penny..
>
> wow.. you trying to get a fastest way to fold a string,
> by using a concatenaiton.. i think its not most effective way, at
> least for given example :)
I'm sure nobody is trying that. It's just an example in a comment. And
this thread is not meant to be about performance at all. We have two
implementations of the same thing. To simplify the system, I thought that
I'll remove one of them. Both versions are easy to understand, so I though
it will be a good idea to keep the faster one.


Levente

>
> I think that using streams, it will be much faster.
>
> (String streamContents: [:s |
>  (#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me') do: [:each | s
> nextPutAll: each; nextPut: $  ]) allButLast
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Juan Vuletich-4
In reply to this post by Eliot Miranda-2
Eliot Miranda wrote:

>
>
> On Tue, Nov 2, 2010 at 1:50 PM, Juan Vuletich <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Eliot Miranda wrote:
>
>
>
>         On Tue, Nov 2, 2010 at 1:23 PM, Juan Vuletich
>         <[hidden email] <mailto:[hidden email]>
>         <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>
>            Nicolas Cellier wrote:
>
>
>                I'd say which selector are implemented in Pharo and Cuis ?
>                We should better converge.
>
>                Nicolas
>                
>
>            Hi folks,
>
>            Cuis doesn't include any of them yet. I can add whatever people
>            prefer. What I'd do different is the implementation:
>
>            fold: aBinaryBlock
>
>              "Evaluate the block with the first two elements of the
>         receiver,
>               then with the result of the first evaluation and the
>         next element,
>               and so on.  Answer the result of the final evaluation.
>         If the
>            receiver
>               is empty, raise an error. If the receiver has a single
>         element,
>            answer
>               that element."
>              "
>              #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me')
>         fold: [:a
>            :b | a, ' ', b]
>              "
>              | noPreviousValue |
>              noPreviousValue := Object new.    "something that can't be in
>            the receiver"
>              ^self inject: noPreviousValue into: [ :previousValue :each |
>                  previousValue == noPreviousValue
>                      ifTrue: [ each ]
>                      ifFalse: [ aBinaryBlock value: previousValue
>         value: each ]]
>
>            This is easier to understand, and it also makes clear the
>         relation
>            between #fold: (or #reduce:) and #inject:into: .
>
>
>         I disagree.   inject:into: is not particularly easy to
>         understand, whereas both the existing fold: and reduce: are
>         understandable in terms of do:.
>
>
>     I say this is easier to understand, given that #inject:into: is
>     already understood. #inject:into: and #fold: / #reduce: are very
>     close in what they do. So close, that it is best to show how they
>     differ. That's what my implementation does. Showing the use of
>     #inject:into:  (with proper names for block args) is a didactic
>     bonus.
>
>
>         Also the use of inject:into: isn't really buying you anything
>         since the logic in the block within the inject:into: usage is
>         as complex as that within fold: or reduce:. Further, this /is/
>         a lot slower because of the extra level of evaluation (there's
>         an extra block activation around each element).
>
>          
>         best,
>         Eliot
>
>
>     No, it is not slower:
>       [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up'
>     'to' 'me') fold: [:a :b | a, ' ', b]]] timeToRun 2710 | 879
>       [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up'
>     'to' 'me') foldx: [:a :b | a, ' ', b]]] timeToRun 2723 | 874
>       [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up'
>     'to' 'me') reduce: [:a :b | a, ' ', b]]] timeToRun 2780 | 881
>
>
> Be very careful to run these under the same conditions.  You'd need to
> run each of these immediately following a GC to get meaningful
> results.  Also use 1 to: n do for the outer loop since timesRepeat:
> uses do: and so the measurement harness is probably a significant
> fraction of the run-time.  Also measure the single-element cost.  BTW,
> I would use (1 to: 10) fold: [:a :b| a + b] to reduce the overhead of
> string concatenation & allocation and focus more on the costs of the
> various fold: implementations.

You're right wrt performance (only tried Cog on this example):
    Smalltalk garbageCollect. [1 to: 200000 do: [ :i | (1 to: 100) fold:
[:a :b| a + b] ]] timeToRun 1367
    Smalltalk garbageCollect. [1 to: 200000 do: [ :i | (1 to: 100)
foldx: [:a :b| a + b] ]] timeToRun 1656
    Smalltalk garbageCollect. [1 to: 200000 do: [ :i | (1 to: 100)
reduce: [:a :b| a + b] ]] timeToRun 1306

>
>
>
>     #foldx: is my implementation. In all cases, the first number is
>     regular interpreter Squeak 4.2.4beta1U.app, the second number is
>     Cog Squeak 5.8b10-1.app on a 1.6GHz Mac mini.
>
>     The argument about complexity is more aesthetic than anything.
>
>
> Indeed; and so a matter of taste.  My aesthetics end up preferring
> fold: over the inject:into: based one :/
>
>  
>
>     My version doesn't need assignment to outer temps, and I see that
>     'cleaner' and closer to FP.
>
>
>
> But the assignment to the outer temp is hidden in inject:into:; it's
> still there ;)

:)

Cheers,
Juan Vuletich

Reply | Threaded
Open this post in threaded view
|

Re: Collection's #fold: vs #reduce:

Eliot Miranda-2
Hi Juan,

On Tue, Nov 2, 2010 at 2:35 PM, Juan Vuletich <[hidden email]> wrote:
Eliot Miranda wrote:


On Tue, Nov 2, 2010 at 1:50 PM, Juan Vuletich <[hidden email] <mailto:[hidden email]>> wrote:

   Eliot Miranda wrote:



       On Tue, Nov 2, 2010 at 1:23 PM, Juan Vuletich
       <[hidden email] <mailto:[hidden email]>
       <mailto:[hidden email] <mailto:[hidden email]>>> wrote:

          Nicolas Cellier wrote:


              I'd say which selector are implemented in Pharo and Cuis ?
              We should better converge.

              Nicolas
             
          Hi folks,

          Cuis doesn't include any of them yet. I can add whatever people
          prefer. What I'd do different is the implementation:

          fold: aBinaryBlock

            "Evaluate the block with the first two elements of the
       receiver,
             then with the result of the first evaluation and the
       next element,
             and so on.  Answer the result of the final evaluation.
       If the
          receiver
             is empty, raise an error. If the receiver has a single
       element,
          answer
             that element."
            "
            #('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up' 'to' 'me')
       fold: [:a
          :b | a, ' ', b]
            "
            | noPreviousValue |
            noPreviousValue := Object new.    "something that can't be in
          the receiver"
            ^self inject: noPreviousValue into: [ :previousValue :each |
                previousValue == noPreviousValue
                    ifTrue: [ each ]
                    ifFalse: [ aBinaryBlock value: previousValue
       value: each ]]

          This is easier to understand, and it also makes clear the
       relation
          between #fold: (or #reduce:) and #inject:into: .


       I disagree.   inject:into: is not particularly easy to
       understand, whereas both the existing fold: and reduce: are
       understandable in terms of do:.


   I say this is easier to understand, given that #inject:into: is
   already understood. #inject:into: and #fold: / #reduce: are very
   close in what they do. So close, that it is best to show how they
   differ. That's what my implementation does. Showing the use of
   #inject:into:  (with proper names for block args) is a didactic
   bonus.


       Also the use of inject:into: isn't really buying you anything
       since the logic in the block within the inject:into: usage is
       as complex as that within fold: or reduce:. Further, this /is/
       a lot slower because of the extra level of evaluation (there's
       an extra block activation around each element).

                best,
       Eliot


   No, it is not slower:
     [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up'
   'to' 'me') fold: [:a :b | a, ' ', b]]] timeToRun 2710 | 879
     [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up'
   'to' 'me') foldx: [:a :b | a, ' ', b]]] timeToRun 2723 | 874
     [100000 timesRepeat: [#('if' 'it' 'is' 'to' 'be' 'it' 'is' 'up'
   'to' 'me') reduce: [:a :b | a, ' ', b]]] timeToRun 2780 | 881


Be very careful to run these under the same conditions.  You'd need to run each of these immediately following a GC to get meaningful results.  Also use 1 to: n do for the outer loop since timesRepeat: uses do: and so the measurement harness is probably a significant fraction of the run-time.  Also measure the single-element cost.  BTW, I would use (1 to: 10) fold: [:a :b| a + b] to reduce the overhead of string concatenation & allocation and focus more on the costs of the various fold: implementations.

You're right wrt performance (only tried Cog on this example):
  Smalltalk garbageCollect. [1 to: 200000 do: [ :i | (1 to: 100) fold: [:a :b| a + b] ]] timeToRun 1367
  Smalltalk garbageCollect. [1 to: 200000 do: [ :i | (1 to: 100) foldx: [:a :b| a + b] ]] timeToRun 1656
  Smalltalk garbageCollect. [1 to: 200000 do: [ :i | (1 to: 100) reduce: [:a :b| a + b] ]] timeToRun 1306

Yes, alas one has to be hyper-careful.  For example if I run the following in my current image:

Smalltalk garbageCollect.
{ [1 to: 1000000 do: [ :i | (1 to: 100) fold: [:a :b| a + b] ]] timeToRun.
  [1 to: 1000000 do: [ :i | (1 to: 100) foldAlt: [:a :b| a + b] ]] timeToRun.
  [1 to: 1000000 do: [ :i | (1 to: 100) reduce: [:a :b| a + b] ]] timeToRun }

chances are that the second one will be fastest, irrespective of which implementation is used.  So change the order to foldAlt:, fold: reduce: and fold: will win, etc.

Levente is very careful to measure the implementations precisely, so we can trust his numbers.


cheers,
Eliot





   #foldx: is my implementation. In all cases, the first number is
   regular interpreter Squeak 4.2.4beta1U.app, the second number is
   Cog Squeak 5.8b10-1.app on a 1.6GHz Mac mini.

   The argument about complexity is more aesthetic than anything.


Indeed; and so a matter of taste.  My aesthetics end up preferring fold: over the inject:into: based one :/

 
   My version doesn't need assignment to outer temps, and I see that
   'cleaner' and closer to FP.



But the assignment to the outer temp is hidden in inject:into:; it's still there ;)

:)

Cheers,
Juan Vuletich




12