Smalltalk › Pharo › Pharo Smalltalk Users

Set >> collect:thenDo:

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

15 messages Options

Herby Vojčík

Set >> collect:thenDo:

Hello!

----
(#(1 2 3) asSet collect: #odd)
do: [ :each | Transcript show: each; cr ]

> true
> false

----

#(1 2 3) asSet collect: #odd
thenDo: [ :each | Transcript show: each; cr ]

> true
> false
> true
----

Bug or feature?

Herby

Kasper Osterbye

Re: Set >> collect:thenDo:

Feature!

collect: forms a new collection of the same kind as its receiver. In this case a set. As the result of your collect:

#(1 2 3) asSet collect: #odd)

is booleans, the resulting set will contain only to elements (the duplicate odd resulting in true is removed).

collect: thenDo: applies the collect-block to each element of the receiver, and then applies the do to each of those results. You can see the implementation of collect:thenDo: in class Collection.

Best,

Kasper

On 7 September 2019 at 17.22.03, Herby Vojčík ([hidden email]) wrote:

Hello!

----
(#(1 2 3) asSet collect: #odd)
do: [ :each | Transcript show: each; cr ]

> true
> false

----

#(1 2 3) asSet collect: #odd
thenDo: [ :each | Transcript show: each; cr ]

> true
> false
> true
----

Bug or feature?

Herby

Steffen Märcker

Re: Set >> collect:thenDo:

Surprise!

The selector #collect:thenDo: strongly suggests that it behaves just as #collect: then #do:. But as #collect: usually means map + aggregate in the reveiver type, I'd expect the input to the do block to be deduped already. At least it is an easy to miss source of subtle bugs. Maybe an additional method #map:thenDo: would make sense?

Best, Steffen

Am 8. September 2019 08:02:30 MESZ schrieb "Kasper Østerbye" <[hidden email]>:

Feature!

collect: forms a new collection of the same kind as its receiver. In this case a set. As the result of your collect:
#(1 2 3) asSet collect: #odd)
is booleans, the resulting set will contain only to elements (the duplicate odd resulting in true is removed).

collect: thenDo: applies the collect-block to each element of the receiver, and then applies the do to each of those results. You can see the implementation of collect:thenDo: in class Collection.

Best,

Kasper

On 7 September 2019 at 17.22.03, Herby Vojčík ([hidden email]) wrote:

Hello!

----
(#(1 2 3) asSet collect: #odd)
do: [ :each | Transcript show: each; cr ]

> true
> false

----

#(1 2 3) asSet collect: #odd
thenDo: [ :each | Transcript show: each; cr ]

> true
> false
> true
----

Bug or feature?

Herby

Richard Sargent

Re: Set >> collect:thenDo:

Administrator

In reply to this post by Kasper Osterbye

I don't know the correct answer, but I am skeptical of one that relies on a specific implementation rather than a specific definition.

I would like to see and understand the arguments for one interpretation versus another.

Prima facie, the expectation that set behaviour propagates through the implementation is appealing. The correct answer is anything but clear cut

On Sat, Sep 7, 2019, 23:03 Kasper Østerbye <[hidden email]> wrote:

Feature!

collect: forms a new collection of the same kind as its receiver. In this case a set. As the result of your collect:
#(1 2 3) asSet collect: #odd)
is booleans, the resulting set will contain only to elements (the duplicate odd resulting in true is removed).

collect: thenDo: applies the collect-block to each element of the receiver, and then applies the do to each of those results. You can see the implementation of collect:thenDo: in class Collection.

Best,

Kasper

On 7 September 2019 at 17.22.03, Herby Vojčík ([hidden email]) wrote:

Hello!

----
(#(1 2 3) asSet collect: #odd)
do: [ :each | Transcript show: each; cr ]

> true
> false

----

#(1 2 3) asSet collect: #odd
thenDo: [ :each | Transcript show: each; cr ]

> true
> false
> true
----

Bug or feature?

Herby

Kasper Osterbye

Re: Set >> collect:thenDo:

The first version:

(#(1 2 3) asSet collect: #odd)
do: [ :each | Transcript show: each; cr ]

is rather straight forward I believe, as collect: and do: has been around forever (relatively speaking).

#(1 2 3) asSet collect: #odd
thenDo: [ :each | Transcript show: each; cr ]

On 8 September 2019 at 09.13.36, Richard Sargent ([hidden email]) wrote:

I am skeptical of one that relies on a specific implementation rather than a specific definition.

I share your feeling. I am not sure where such a definition would come from. In Squeak it is defined as:

collect: collectBlock thenDo: doBlock

^ (self collect: collectBlock) do: doBlock

In pharo as:

collect: collectBlock thenDo: doBlock

^ self do: [ :each | doBlock value: (collectBlock value: each)]

I might have called the method collect:eachDo:, but we each have slightly different styles. What I like about the pharo version is that is a shorthand for something which is not achieved by mere parentheses.

Best,

Kasper

Peter Kenny

Re: Set >> collect:thenDo:

Two comments:
First, the method comment for Collection>>collect:thenDo: is "Utility method
to improve readability", which is exactly the same as for
collect:thenSelect: and collect:thenReject:. This suggests that the
*intention* of the method is not to introduce new behaviour, but simply to
provide a shorthand for the version with parentheses. For other kinds of
collection this is true; just the deduping makes Set different. If we want
the different behaviour, this should be indicated by method name and
comment.
Second, if we remove asSet from the second snippet, the output is exactly
the same. It will be the same as long as the original collection has no
duplicates. Somehow the effect is to ignore the asSet. It just smells wrong.

Peter Kenny

Kasper Osterbye wrote

> The first version:
>
> (#(1 2 3) asSet collect: #odd)
> do: [ :each | Transcript show: each; cr ]
>
> is rather straight forward I believe, as collect: and do: has been around
> forever (relatively speaking).
>
>
> #(1 2 3) asSet collect: #odd
> thenDo: [ :each | Transcript show: each; cr ]
>
>
> On 8 September 2019 at 09.13.36, Richard Sargent (

> richard.sargent@

> ) wrote:
>
> I am skeptical of one that relies on a specific implementation rather
> than
> a specific definition.
>
> I share your feeling. I am not sure where such a definition would come
> from. In Squeak it is defined as:
>
> collect: collectBlock thenDo: doBlock
>
> ^ (self collect: collectBlock) do: doBlock
>
> In pharo as:
>
> collect: collectBlock thenDo: doBlock
>
> ^ self do: [ :each | doBlock value: (collectBlock value: each)]
>
> I might have called the method collect:eachDo:, but we each have slightly
> different styles. What I like about the pharo version is that is a
> shorthand for something which is not achieved by mere parentheses.
>
> Best,
>
> Kasper

--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Herby Vojčík

Re: Set >> collect:thenDo:

On 8. 9. 2019 14:28, Peter Kenny wrote:
> Two comments:
> First, the method comment for Collection>>collect:thenDo: is "Utility method
> to improve readability", which is exactly the same as for
> collect:thenSelect: and collect:thenReject:. This suggests that the
> *intention* of the method is not to introduce new behaviour, but simply to
> provide a shorthand for the version with parentheses. For other kinds of

I had that same impression.

> collection this is true; just the deduping makes Set different. If we want

I would be more defensive here and say that generic collection should
have (collect:)do: implementation and only sequenceable collections have
the optimized one (if it indeed is the case that it is a shotrhand for
parenthesized one).

> the different behaviour, this should be indicated by method name and
> comment.
> Second, if we remove asSet from the second snippet, the output is exactly
> the same. It will be the same as long as the original collection has no
> duplicates. Somehow the effect is to ignore the asSet. It just smells wrong.
>
> Peter Kenny

Herby

> Kasper Osterbye wrote
>> The first version:
>>
>> (#(1 2 3) asSet collect: #odd)
>> do: [ :each | Transcript show: each; cr ]
>>
>> is rather straight forward I believe, as collect: and do: has been around
>> forever (relatively speaking).
>>
>>
>> #(1 2 3) asSet collect: #odd
>> thenDo: [ :each | Transcript show: each; cr ]
>>
>>
>> On 8 September 2019 at 09.13.36, Richard Sargent (
>
>> richard.sargent@
>
>> ) wrote:
>>
>> I am skeptical of one that relies on a specific implementation rather
>> than
>> a specific definition.
>>
>> I share your feeling. I am not sure where such a definition would come
>> from. In Squeak it is defined as:
>>
>> collect: collectBlock thenDo: doBlock
>>
>> ^ (self collect: collectBlock) do: doBlock
>>
>> In pharo as:
>>
>> collect: collectBlock thenDo: doBlock
>>
>> ^ self do: [ :each | doBlock value: (collectBlock value: each)]
>>
>> I might have called the method collect:eachDo:, but we each have slightly
>> different styles. What I like about the pharo version is that is a
>> shorthand for something which is not achieved by mere parentheses.
>>
>> Best,
>>
>> Kasper
>
>
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>

Richard O'Keefe

Re: Set >> collect:thenDo:

In reply to this post by Herby Vojčík

The difference in behaviour is exactly what I would expect.

I can't think of any way to make

aSet collect: collectBlock thenDo: doBlock

act identically to

(aSet collect: collectBlock) do: doBlock

without creating an intermediate set.

Here, for example, is the definition in Smalltalk/X,

with comments removed for brevity.

collect:collectBlock thenDo:aBlock

    self do:[:each | aBlock value:(collectBlock value:each)].

(The lack of spaces after keywords is ST/X convention.)

Another way to express this would be with a function composition

combinator:

aSet do: (collectBlock before: doBlock).

aSet do: (doBlock after: collectBlock).

Note, by the way, that Set>>collect: has always been something

to approach with vigilant trepidation.  Given Set, IdentitySet,

and PluggableSet, there is in general no reason to expect that

the receiver's notion of equality is approach for the results

of the collectBlock.  That's why my library lets me write

  Set withAll: collection collect: collectBlock

  IdentitySet withAll: collection collect: collectBlock

  (PluggableSet equalBy: equality hashBy: hasher) withAll: collection collect: collectBlock

  (PluggableSet accordingTo: equivalence) withAll: collection collect: collectBlock

So in the case of

  aSet collect: collectBlock thenDo: doBlock

there is no reason to expect aSet to know how to tell when two

collectBlock results should be regarded as equivalent.  Seriously,

how is aSet supposed to know what collectBlock thinks equivalence

looks like?

On Sun, 8 Sep 2019 at 03:22, Herby Vojčík <[hidden email]> wrote:

Hello!

----
(#(1 2 3) asSet collect: #odd)
do: [ :each | Transcript show: each; cr ]

> true
> false

----

#(1 2 3) asSet collect: #odd
thenDo: [ :each | Transcript show: each; cr ]

> true
> false
> true
----

Bug or feature?

Herby

Richard O'Keefe

Re: Set >> collect:thenDo:

In reply to this post by Herby Vojčík

(1) If you want (aSet collect: collectBlock) do: doBlock

    you can write exactly that.  Nothing stops you, and it will

    be as clear and reliable as any use of Set>>collect:, which

    is to say NOT VERY CLEAR OR RELIABLE AT ALL.

(2) #collect:then{Do:Select:Reject:} had no other purpose than

    to avoid creating an intermediate and otherwise useless

    collection.  If you are not trying to involve an intermediate

    set then it just plain wrong to use #collect:thenDo:.

(3) Oddly enough, the reason that #collect:thenDo: exists in my

    library is that I copied it from Squeak, at a time when it had

    the same definition as Pharo and ST/X.  Had I known of the change

    in Squeak I would have bitterly opposed it.  The comment in the

    current Squeak code, that it is for readability, is 100% the

    reverse of the truth.  Using the version with parentheses is WAY

    clearer than using the portmanteau method.  Sigh.  I see they

    broke #collect:thenSelect: in the same way.

(4) Let me offer you another member of the #collect:then* family.

    Collection

      collect: collectBlock thenInject: initial into: injectBlock

|r|

        r := initial.

        self do: [:each |

          r := injectBlock value: r value: (collectBlock value: each)].

^r

    #(now is the hour for us to say goodbye) asSet

      collect: [:each | each size]

      thenInject: 0 into: [:acc :each | acc + each]

 => 29

    (#(now is the hour for us to say goodbye) asSet
      collect: [:each | each size])
      inject: 0 into: [:acc :each | acc + each]
 => 16

   That is, it would be WRONG to implement #collect:thenInject:into:
   as #collect: followed by #inject:into:.  The point is NOT to
   coalesce things that the receiver might (in general, incorrectly)
   regard as equal.

(5) The bottom line is that if #collect:thenDo: and its relatives did
    not have their present semantics in Pharo (and ST/X), they would
    need to be reinvented, with names that were not as clear.

(1) Just to repeat for emphasis, if you *want* (_ collect: _) do: _
    then that is exactly what you should write.  There is no
    excuse for using #collect:thenDo: in that case.  It is NOT
    clearer to do so.  And you should imagine me jumping up and
    down screaming that sending #collect: to a set is a bad code
    smell which demands very explicit documentation as to why you
    (for example) want a Set to answer a Set but an IdentitySet
    to also answer a Set, not the expected IdentitySet.  (I have
    been bitten by that more than once.)

On Mon, 9 Sep 2019 at 01:33, Herby Vojčík <[hidden email]> wrote:

On 8. 9. 2019 14:28, Peter Kenny wrote:
> Two comments:
> First, the method comment for Collection>>collect:thenDo: is "Utility method
> to improve readability", which is exactly the same as for
> collect:thenSelect: and collect:thenReject:. This suggests that the
> *intention* of the method is not to introduce new behaviour, but simply to
> provide a shorthand for the version with parentheses. For other kinds of

I had that same impression.

> collection this is true; just the deduping makes Set different. If we want

I would be more defensive here and say that generic collection should
have (collect:)do: implementation and only sequenceable collections have
the optimized one (if it indeed is the case that it is a shotrhand for
parenthesized one).

> the different behaviour, this should be indicated by method name and
> comment.
> Second, if we remove asSet from the second snippet, the output is exactly
> the same. It will be the same as long as the original collection has no
> duplicates. Somehow the effect is to ignore the asSet. It just smells wrong.
>
> Peter Kenny

Herby

> Kasper Osterbye wrote
>> The first version:
>>
>> (#(1 2 3) asSet collect: #odd)
>> do: [ :each | Transcript show: each; cr ]
>>
>> is rather straight forward I believe, as collect: and do: has been around
>> forever (relatively speaking).
>>
>>
>> #(1 2 3) asSet collect: #odd
>> thenDo: [ :each | Transcript show: each; cr ]
>>
>>
>> On 8 September 2019 at 09.13.36, Richard Sargent (
>
>> richard.sargent@
>
>> ) wrote:
>>
>> I am skeptical of one that relies on a specific implementation rather
>> than
>> a specific definition.
>>
>> I share your feeling. I am not sure where such a definition would come
>> from. In Squeak it is defined as:
>>
>> collect: collectBlock thenDo: doBlock
>>
>> ^ (self collect: collectBlock) do: doBlock
>>
>> In pharo as:
>>
>> collect: collectBlock thenDo: doBlock
>>
>> ^ self do: [ :each | doBlock value: (collectBlock value: each)]
>>
>> I might have called the method collect:eachDo:, but we each have slightly
>> different styles. What I like about the pharo version is that is a
>> shorthand for something which is not achieved by mere parentheses.
>>
>> Best,
>>
>> Kasper
>
>
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>

David T. Lewis

Re: Set >> collect:thenDo:

On Mon, Sep 09, 2019 at 05:46:48PM +1200, Richard O'Keefe wrote:

>
> (3) Oddly enough, the reason that #collect:thenDo: exists in my
> library is that I copied it from Squeak, at a time when it had
> the same definition as Pharo and ST/X. Had I known of the change
> in Squeak I would have bitterly opposed it. The comment in the
> current Squeak code, that it is for readability, is 100% the
> reverse of the truth. Using the version with parentheses is WAY
> clearer than using the portmanteau method. Sigh. I see they
> broke #collect:thenSelect: in the same way.
>

In Squeak, collect:thenDo: is implemented only in Collection.
The method stamp is dgd 9/13/2004 23:42, which hardly qualifies as
a recent change. And the method stamp for collect:thenSelect: is
sma 5/12/2000.

Dave

Steffen Märcker

Re: Set >> collect:thenDo:

In reply to this post by Richard O'Keefe

I think this thread indicates that the selector #collect:thenDo: may
indeed cause some missunderstanding given standard semantics of #collect:.
Is there any reason not to use a different one, such as #map:thenDo:, to
make the difference clearer?

A nice side effect could be a new automatic refactoring rule that
consolidates chained #collect: and #do: sends into a faster
#collect:thenDo: send without breaking existing code.

Am .09.2019, 07:46 Uhr, schrieb Richard O'Keefe <[hidden email]>:

> (1) If you want (aSet collect: collectBlock) do: doBlock
> you can write exactly that. Nothing stops you, and it will
> be as clear and reliable as any use of Set>>collect:, which
> is to say NOT VERY CLEAR OR RELIABLE AT ALL.
>
> (2) #collect:then{Do:Select:Reject:} had no other purpose than
> to avoid creating an intermediate and otherwise useless
> collection. If you are not trying to involve an intermediate
> set then it just plain wrong to use #collect:thenDo:.
>
> (3) Oddly enough, the reason that #collect:thenDo: exists in my
> library is that I copied it from Squeak, at a time when it had
> the same definition as Pharo and ST/X. Had I known of the change
> in Squeak I would have bitterly opposed it. The comment in the
> current Squeak code, that it is for readability, is 100% the
> reverse of the truth. Using the version with parentheses is WAY
> clearer than using the portmanteau method. Sigh. I see they
> broke #collect:thenSelect: in the same way.
>
> (4) Let me offer you another member of the #collect:then* family.
>
> Collection
> collect: collectBlock thenInject: initial into: injectBlock
> |r|
> r := initial.
> self do: [:each |
> r := injectBlock value: r value: (collectBlock value: each)].
> ^r
>
> #(now is the hour for us to say goodbye) asSet
> collect: [:each | each size]
> thenInject: 0 into: [:acc :each | acc + each]
> => 29
> (#(now is the hour for us to say goodbye) asSet
> collect: [:each | each size])
> inject: 0 into: [:acc :each | acc + each]
> => 16
>
> That is, it would be WRONG to implement #collect:thenInject:into:
> as #collect: followed by #inject:into:. The point is NOT to
> coalesce things that the receiver might (in general, incorrectly)
> regard as equal.
>
> (5) The bottom line is that if #collect:thenDo: and its relatives did
> not have their present semantics in Pharo (and ST/X), they would
> need to be reinvented, with names that were not as clear.
>
> (1) Just to repeat for emphasis, if you *want* (_ collect: _) do: _
> then that is exactly what you should write. There is no
> excuse for using #collect:thenDo: in that case. It is NOT
> clearer to do so. And you should imagine me jumping up and
> down screaming that sending #collect: to a set is a bad code
> smell which demands very explicit documentation as to why you
> (for example) want a Set to answer a Set but an IdentitySet
> to also answer a Set, not the expected IdentitySet. (I have
> been bitten by that more than once.)
>
>
>
> On Mon, 9 Sep 2019 at 01:33, Herby Vojčík <[hidden email]> wrote:
>
>> On 8. 9. 2019 14:28, Peter Kenny wrote:
>> > Two comments:
>> > First, the method comment for Collection>>collect:thenDo: is "Utility
>> method
>> > to improve readability", which is exactly the same as for
>> > collect:thenSelect: and collect:thenReject:. This suggests that the
>> > *intention* of the method is not to introduce new behaviour, but
>> simply
>> to
>> > provide a shorthand for the version with parentheses. For other kinds
>> of
>>
>> I had that same impression.
>>
>> > collection this is true; just the deduping makes Set different. If we
>> want
>>
>> I would be more defensive here and say that generic collection should
>> have (collect:)do: implementation and only sequenceable collections have
>> the optimized one (if it indeed is the case that it is a shotrhand for
>> parenthesized one).
>>
>> > the different behaviour, this should be indicated by method name and
>> > comment.
>> > Second, if we remove asSet from the second snippet, the output is
>> exactly
>> > the same. It will be the same as long as the original collection has
>> no
>> > duplicates. Somehow the effect is to ignore the asSet. It just smells
>> wrong.
>> >
>> > Peter Kenny
>>
>> Herby
>>
>> > Kasper Osterbye wrote
>> >> The first version:
>> >>
>> >> (#(1 2 3) asSet collect: #odd)
>> >> do: [ :each | Transcript show: each; cr ]
>> >>
>> >> is rather straight forward I believe, as collect: and do: has been
>> around
>> >> forever (relatively speaking).
>> >>
>> >>
>> >> #(1 2 3) asSet collect: #odd
>> >> thenDo: [ :each | Transcript show: each; cr ]
>> >>
>> >>
>> >> On 8 September 2019 at 09.13.36, Richard Sargent (
>> >
>> >> richard.sargent@
>> >
>> >> ) wrote:
>> >>
>> >> I am skeptical of one that relies on a specific implementation
>> rather
>> >> than
>> >> a specific definition.
>> >>
>> >> I share your feeling. I am not sure where such a definition would
>> come
>> >> from. In Squeak it is defined as:
>> >>
>> >> collect: collectBlock thenDo: doBlock
>> >>
>> >> ^ (self collect: collectBlock) do: doBlock
>> >>
>> >> In pharo as:
>> >>
>> >> collect: collectBlock thenDo: doBlock
>> >>
>> >> ^ self do: [ :each | doBlock value: (collectBlock value: each)]
>> >>
>> >> I might have called the method collect:eachDo:, but we each have
>> slightly
>> >> different styles. What I like about the pharo version is that is a
>> >> shorthand for something which is not achieved by mere parentheses.
>> >>
>> >> Best,
>> >>
>> >> Kasper
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>> >
>>
>>
>>

Richard O'Keefe

Re: Set >> collect:thenDo:

In reply to this post by David T. Lewis

Digging a bit further, the change from the pointless implementation to

the current implementation was made between Pharo 2.0 and Pharo 3.0

and the current definition was introduced into ST/X some time between

2010 and 2012, which is about the same time as Pharo.

{collect,select,reject}:thenDo: were added to my library on

2004.09.14, exactly a day after the version now in Squeak.  Which

leads me to believe that I picked the method up from a previous

version of Squeak and that some of the Squeak version history has

been lost.

By the way, it's not just Sets where the current Squeak definition is

a pain in the posterior.  Consider the following example.

|o|

  o := OrderedCollection new.

  #[3 1 4 1 5 9]

    collect: [:byte | byte asFloat]

    thenDo: [:float | o addLast: float].

  Transcript print: o; cr.

This works fine in Pharo, in Smalltalk/X, and in my library.

It raises an exception in Squeak.

Or try

   #[3 1 4 1 5 9]

     collect: [:byte | byte odd ifTrue: [byte printString]

                                ifFalse: [byte]]

     thenSelect: [:each | each isNumber]

In Pharo and my library, it straightforwardly answers #[4].

In Squeak, it raises an exception.

So that's sets where the current Squeak definition can give you

trouble, and ByteArrays, and we are not done yet!  Pretty much

anything other than an Array or an OrderedCollection is going to

get you in trouble in Squeak.  Just think of the fun you can

have with SortedCollection.  Ooh, the list goes on and on.

So, the Squeak definition

* does not improve readability.

  In fact, it REDUCES readability because the human reader stops

  to ask "why is this portmanteau method used rather than the

  obvious code" and several minutes later discovers that there

  is NO reason.

* is useless with most collection classes.

* does nothing to improve performance but just adds overhead.

Are we done yet?

On Tue, 10 Sep 2019 at 01:27, David T. Lewis <[hidden email]> wrote:

On Mon, Sep 09, 2019 at 05:46:48PM +1200, Richard O'Keefe wrote:
>
> (3) Oddly enough, the reason that #collect:thenDo: exists in my
> library is that I copied it from Squeak, at a time when it had
> the same definition as Pharo and ST/X. Had I known of the change
> in Squeak I would have bitterly opposed it. The comment in the
> current Squeak code, that it is for readability, is 100% the
> reverse of the truth. Using the version with parentheses is WAY
> clearer than using the portmanteau method. Sigh. I see they
> broke #collect:thenSelect: in the same way.
>

In Squeak, collect:thenDo: is implemented only in Collection.
The method stamp is dgd 9/13/2004 23:42, which hardly qualifies as
a recent change. And the method stamp for collect:thenSelect: is
sma 5/12/2000.

Dave

Richard O'Keefe

Re: Set >> collect:thenDo:

In reply to this post by Steffen Märcker

An automatic refactoring rule that converts

  (a collect: b) do: c

to

  (a collect: b thenDo: c)

would change the semantics of the code in all of

Squeak, Pharo, and Smalltalk/X, whichever definition

of #collect:thenDo: is taken, and in Squeak would

make the code slower, not faster.

Using a different name *would* reduce confusion.

However, there is an alternative.

My library copied the #virtual{Collect,Select,Reject}:

interface from Strongtalk.  As the comment puts it,

 The idea is to do #select:, #reject:, #collect:, and some
 other collection->collection operations lazily.  As long
 as you just use #do: &c, the operations will be referred
 back to the original collection, so that there is less
 need for fusion methods like #select:thenCollect:.

...

 You can do
        (((( coll
            virtualSelect: [..])
            virtualCollect: [...])
            virtualSelect: [..])
            virtualCollect: [...])
          asCollection

  and only *one* new collection is actually built.

So

  (aCollection virtualCollect: transformBlock)

    do: actionBlock

*does* build an intermediate object (with two instance

variables) but *doesn't* build a whole intermediate

collection.  And that combination *could* be fused by

an automatic rule without breaking anything.

Does that sound like a way forward?

I could convert my implementation of this interface to

Pharo if people would like me to.

On Tue, 10 Sep 2019 at 01:51, Steffen Märcker <[hidden email]> wrote:

I think this thread indicates that the selector #collect:thenDo: may
indeed cause some missunderstanding given standard semantics of #collect:.
Is there any reason not to use a different one, such as #map:thenDo:, to
make the difference clearer?

A nice side effect could be a new automatic refactoring rule that
consolidates chained #collect: and #do: sends into a faster
#collect:thenDo: send without breaking existing code.

Am .09.2019, 07:46 Uhr, schrieb Richard O'Keefe <[hidden email]>:

> (1) If you want (aSet collect: collectBlock) do: doBlock
> you can write exactly that. Nothing stops you, and it will
> be as clear and reliable as any use of Set>>collect:, which
> is to say NOT VERY CLEAR OR RELIABLE AT ALL.
>
> (2) #collect:then{Do:Select:Reject:} had no other purpose than
> to avoid creating an intermediate and otherwise useless
> collection. If you are not trying to involve an intermediate
> set then it just plain wrong to use #collect:thenDo:.
>
> (3) Oddly enough, the reason that #collect:thenDo: exists in my
> library is that I copied it from Squeak, at a time when it had
> the same definition as Pharo and ST/X. Had I known of the change
> in Squeak I would have bitterly opposed it. The comment in the
> current Squeak code, that it is for readability, is 100% the
> reverse of the truth. Using the version with parentheses is WAY
> clearer than using the portmanteau method. Sigh. I see they
> broke #collect:thenSelect: in the same way.
>
> (4) Let me offer you another member of the #collect:then* family.
>
> Collection
> collect: collectBlock thenInject: initial into: injectBlock
> |r|
> r := initial.
> self do: [:each |
> r := injectBlock value: r value: (collectBlock value: each)].
> ^r
>
> #(now is the hour for us to say goodbye) asSet
> collect: [:each | each size]
> thenInject: 0 into: [:acc :each | acc + each]
> => 29
> (#(now is the hour for us to say goodbye) asSet
> collect: [:each | each size])
> inject: 0 into: [:acc :each | acc + each]
> => 16
>
> That is, it would be WRONG to implement #collect:thenInject:into:
> as #collect: followed by #inject:into:. The point is NOT to
> coalesce things that the receiver might (in general, incorrectly)
> regard as equal.
>
> (5) The bottom line is that if #collect:thenDo: and its relatives did
> not have their present semantics in Pharo (and ST/X), they would
> need to be reinvented, with names that were not as clear.
>
> (1) Just to repeat for emphasis, if you *want* (_ collect: _) do: _
> then that is exactly what you should write. There is no
> excuse for using #collect:thenDo: in that case. It is NOT
> clearer to do so. And you should imagine me jumping up and
> down screaming that sending #collect: to a set is a bad code
> smell which demands very explicit documentation as to why you
> (for example) want a Set to answer a Set but an IdentitySet
> to also answer a Set, not the expected IdentitySet. (I have
> been bitten by that more than once.)
>
>
>
> On Mon, 9 Sep 2019 at 01:33, Herby Vojčík <[hidden email]> wrote:
>
>> On 8. 9. 2019 14:28, Peter Kenny wrote:
>> > Two comments:
>> > First, the method comment for Collection>>collect:thenDo: is "Utility
>> method
>> > to improve readability", which is exactly the same as for
>> > collect:thenSelect: and collect:thenReject:. This suggests that the
>> > *intention* of the method is not to introduce new behaviour, but
>> simply
>> to
>> > provide a shorthand for the version with parentheses. For other kinds
>> of
>>
>> I had that same impression.
>>
>> > collection this is true; just the deduping makes Set different. If we
>> want
>>
>> I would be more defensive here and say that generic collection should
>> have (collect:)do: implementation and only sequenceable collections have
>> the optimized one (if it indeed is the case that it is a shotrhand for
>> parenthesized one).
>>
>> > the different behaviour, this should be indicated by method name and
>> > comment.
>> > Second, if we remove asSet from the second snippet, the output is
>> exactly
>> > the same. It will be the same as long as the original collection has
>> no
>> > duplicates. Somehow the effect is to ignore the asSet. It just smells
>> wrong.
>> >
>> > Peter Kenny
>>
>> Herby
>>
>> > Kasper Osterbye wrote
>> >> The first version:
>> >>
>> >> (#(1 2 3) asSet collect: #odd)
>> >> do: [ :each | Transcript show: each; cr ]
>> >>
>> >> is rather straight forward I believe, as collect: and do: has been
>> around
>> >> forever (relatively speaking).
>> >>
>> >>
>> >> #(1 2 3) asSet collect: #odd
>> >> thenDo: [ :each | Transcript show: each; cr ]
>> >>
>> >>
>> >> On 8 September 2019 at 09.13.36, Richard Sargent (
>> >
>> >> richard.sargent@
>> >
>> >> ) wrote:
>> >>
>> >> I am skeptical of one that relies on a specific implementation
>> rather
>> >> than
>> >> a specific definition.
>> >>
>> >> I share your feeling. I am not sure where such a definition would
>> come
>> >> from. In Squeak it is defined as:
>> >>
>> >> collect: collectBlock thenDo: doBlock
>> >>
>> >> ^ (self collect: collectBlock) do: doBlock
>> >>
>> >> In pharo as:
>> >>
>> >> collect: collectBlock thenDo: doBlock
>> >>
>> >> ^ self do: [ :each | doBlock value: (collectBlock value: each)]
>> >>
>> >> I might have called the method collect:eachDo:, but we each have
>> slightly
>> >> different styles. What I like about the pharo version is that is a
>> >> shorthand for something which is not achieved by mere parentheses.
>> >>
>> >> Best,
>> >>
>> >> Kasper
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>> >
>>
>>
>>

Kasper Osterbye

Lazy Streams - was: Re: Set >> collect:thenDo:

On 10 September 2019 at 00.56.28, Richard O'Keefe ([hidden email]) wrote:

Does that sound like a way forward?
I could convert my implementation of this interface to
Pharo if people would like me to.

Hi Richard,

It seems like what you propose (in sofar as I understand it) is already present in at least three other libraries which have not yet made it to the core of Pharo:

The Iterator framework Julien is working on: https://github.com/juliendelplanque/Iterators

The XStreams framework: https://code.google.com/archive/p/xtreams/

The transducers: https://github.com/Pharophile/Transducers

I was not aware of the virtual collections from Strongtalk.

I am aware you know them (as you commented on the Iterators Julien proposed).

I have been working with the LINQ stream/iterator framework in C#, and found that to be a rather sound library, lazy evaluation. The streams in Java are also nice, and are actually very good at exploiting multi cores.

It is sad there is no lazy streams in the core of Pharo, and I would find it intersting to participate in a project to build one.

Best,

Kasper

Steffen Märcker

Re: Lazy Streams - was: Re: Set >> collect:thenDo:

Hi,

I really think we should have an efficient way to chain operations on
collections and alike. In my personal opinion, this should be backed by
transducers, as they decouple the operations from the type the data source
and sink. Hence they are applicable to different collection types, streams
and so on. Also, transducers allow for straight-forward parallization of
operations that are concurrency-safe.

However, the transducers port is not complete yet. I think I'll finish it
in October/November, but definitely this year. Maybe this could be a
starting point to discuss how to move forward?

One related question: Is there a somewhat comprehensive benchmark yielding
a baseline of Pharo's collection/streams/iteration performance? I'd really
like to compare the existing implemenation to the alternative approaches.

Best regards,
Steffen

Am .09.2019, 11:30 Uhr, schrieb Kasper Østerbye
<[hidden email]>:

> On 10 September 2019 at 00.56.28, Richard O'Keefe ([hidden email])
> wrote:
>
> Does that sound like a way forward?
> I could convert my implementation of this interface to
> Pharo if people would like me to.
>
> Hi Richard,
>
> It seems like what you propose (in sofar as I understand it) is already
> present in at least three other libraries which have not yet made it to
> the
> core of Pharo:
>
> The Iterator framework Julien is working on:
> https://github.com/juliendelplanque/Iterators
>
> The XStreams framework: https://code.google.com/archive/p/xtreams/
> <https://code.google.com/archive/p/xtreams/>
>
> The transducers: https://github.com/Pharophile/Transducers
> <https://github.com/Pharophile/Transducers>
>
> I was not aware of the virtual collections from Strongtalk.
>
> I am aware you know them (as you commented on the Iterators Julien
> proposed).
>
> I have been working with the LINQ stream/iterator framework in C#, and
> found that to be a rather sound library, lazy evaluation. The streams in
> Java are also nice, and are actually very good at exploiting multi cores.
>
> It is sad there is no lazy streams in the core of Pharo, and I would find
> it intersting to participate in a project to build one.
>
> Best,
>
> Kasper