Hello!
---- (#(1 2 3) asSet collect: #odd) do: [ :each | Transcript show: each; cr ] > true > false ---- #(1 2 3) asSet collect: #odd thenDo: [ :each | Transcript show: each; cr ] > true > false > true ---- Bug or feature? Herby |
Feature! collect: forms a new collection of the same kind as its receiver. In this case a set. As the result of your collect: #(1 2 3) asSet collect: #odd) is booleans, the resulting set will contain only to elements (the duplicate odd resulting in true is removed). collect: thenDo: applies the collect-block to each element of the receiver, and then applies the do to each of those results. You can see the implementation of collect:thenDo: in class Collection. Best, Kasper On 7 September 2019 at 17.22.03, Herby Vojčík ([hidden email]) wrote:
|
Surprise!
The selector #collect:thenDo: strongly suggests that it behaves just as #collect: then #do:. But as #collect: usually means map + aggregate in the reveiver type, I'd expect the input to the do block to be deduped already. At least it is an easy to miss source of subtle bugs. Maybe an additional method #map:thenDo: would make sense? Best, Steffen Am 8. September 2019 08:02:30 MESZ schrieb "Kasper Østerbye" <[hidden email]>:
|
Administrator
|
In reply to this post by Kasper Osterbye
I don't know the correct answer, but I am skeptical of one that relies on a specific implementation rather than a specific definition. I would like to see and understand the arguments for one interpretation versus another. Prima facie, the expectation that set behaviour propagates through the implementation is appealing. The correct answer is anything but clear cut On Sat, Sep 7, 2019, 23:03 Kasper Østerbye <[hidden email]> wrote:
|
The first version: (#(1 2 3) asSet collect: #odd) is rather straight forward I believe, as collect: and do: has been around forever (relatively speaking). #(1 2 3) asSet collect: #odd On 8 September 2019 at 09.13.36, Richard Sargent ([hidden email]) wrote:
I share your feeling. I am not sure where such a definition would come from. In Squeak it is defined as: collect: collectBlock thenDo: doBlock ^ (self collect: collectBlock) do: doBlock In pharo as: collect: collectBlock thenDo: doBlock ^ self do: [ :each | doBlock value: (collectBlock value: each)] I might have called the method collect:eachDo:, but we each have slightly different styles. What I like about the pharo version is that is a shorthand for something which is not achieved by mere parentheses. Best, Kasper |
Two comments:
First, the method comment for Collection>>collect:thenDo: is "Utility method to improve readability", which is exactly the same as for collect:thenSelect: and collect:thenReject:. This suggests that the *intention* of the method is not to introduce new behaviour, but simply to provide a shorthand for the version with parentheses. For other kinds of collection this is true; just the deduping makes Set different. If we want the different behaviour, this should be indicated by method name and comment. Second, if we remove asSet from the second snippet, the output is exactly the same. It will be the same as long as the original collection has no duplicates. Somehow the effect is to ignore the asSet. It just smells wrong. Peter Kenny Kasper Osterbye wrote > The first version: > > (#(1 2 3) asSet collect: #odd) > do: [ :each | Transcript show: each; cr ] > > is rather straight forward I believe, as collect: and do: has been around > forever (relatively speaking). > > > #(1 2 3) asSet collect: #odd > thenDo: [ :each | Transcript show: each; cr ] > > > On 8 September 2019 at 09.13.36, Richard Sargent ( > richard.sargent@ > ) wrote: > > I am skeptical of one that relies on a specific implementation rather > than > a specific definition. > > I share your feeling. I am not sure where such a definition would come > from. In Squeak it is defined as: > > collect: collectBlock thenDo: doBlock > > ^ (self collect: collectBlock) do: doBlock > > In pharo as: > > collect: collectBlock thenDo: doBlock > > ^ self do: [ :each | doBlock value: (collectBlock value: each)] > > I might have called the method collect:eachDo:, but we each have slightly > different styles. What I like about the pharo version is that is a > shorthand for something which is not achieved by mere parentheses. > > Best, > > Kasper -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html |
On 8. 9. 2019 14:28, Peter Kenny wrote:
> Two comments: > First, the method comment for Collection>>collect:thenDo: is "Utility method > to improve readability", which is exactly the same as for > collect:thenSelect: and collect:thenReject:. This suggests that the > *intention* of the method is not to introduce new behaviour, but simply to > provide a shorthand for the version with parentheses. For other kinds of I had that same impression. > collection this is true; just the deduping makes Set different. If we want I would be more defensive here and say that generic collection should have (collect:)do: implementation and only sequenceable collections have the optimized one (if it indeed is the case that it is a shotrhand for parenthesized one). > the different behaviour, this should be indicated by method name and > comment. > Second, if we remove asSet from the second snippet, the output is exactly > the same. It will be the same as long as the original collection has no > duplicates. Somehow the effect is to ignore the asSet. It just smells wrong. > > Peter Kenny Herby > Kasper Osterbye wrote >> The first version: >> >> (#(1 2 3) asSet collect: #odd) >> do: [ :each | Transcript show: each; cr ] >> >> is rather straight forward I believe, as collect: and do: has been around >> forever (relatively speaking). >> >> >> #(1 2 3) asSet collect: #odd >> thenDo: [ :each | Transcript show: each; cr ] >> >> >> On 8 September 2019 at 09.13.36, Richard Sargent ( > >> richard.sargent@ > >> ) wrote: >> >> I am skeptical of one that relies on a specific implementation rather >> than >> a specific definition. >> >> I share your feeling. I am not sure where such a definition would come >> from. In Squeak it is defined as: >> >> collect: collectBlock thenDo: doBlock >> >> ^ (self collect: collectBlock) do: doBlock >> >> In pharo as: >> >> collect: collectBlock thenDo: doBlock >> >> ^ self do: [ :each | doBlock value: (collectBlock value: each)] >> >> I might have called the method collect:eachDo:, but we each have slightly >> different styles. What I like about the pharo version is that is a >> shorthand for something which is not achieved by mere parentheses. >> >> Best, >> >> Kasper > > > > > > -- > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html > |
In reply to this post by Herby Vojčík
The difference in behaviour is exactly what I would expect. I can't think of any way to make aSet collect: collectBlock thenDo: doBlock act identically to (aSet collect: collectBlock) do: doBlock without creating an intermediate set. Here, for example, is the definition in Smalltalk/X, with comments removed for brevity. collect:collectBlock thenDo:aBlock self do:[:each | aBlock value:(collectBlock value:each)]. (The lack of spaces after keywords is ST/X convention.) Another way to express this would be with a function composition combinator: aSet do: (collectBlock before: doBlock). aSet do: (doBlock after: collectBlock). Note, by the way, that Set>>collect: has always been something to approach with vigilant trepidation. Given Set, IdentitySet, and PluggableSet, there is in general no reason to expect that the receiver's notion of equality is approach for the results of the collectBlock. That's why my library lets me write Set withAll: collection collect: collectBlock IdentitySet withAll: collection collect: collectBlock (PluggableSet equalBy: equality hashBy: hasher) withAll: collection collect: collectBlock (PluggableSet accordingTo: equivalence) withAll: collection collect: collectBlock So in the case of aSet collect: collectBlock thenDo: doBlock there is no reason to expect aSet to know how to tell when two collectBlock results should be regarded as equivalent. Seriously, how is aSet supposed to know what collectBlock thinks equivalence looks like? On Sun, 8 Sep 2019 at 03:22, Herby Vojčík <[hidden email]> wrote: Hello! |
In reply to this post by Herby Vojčík
(1) If you want (aSet collect: collectBlock) do: doBlock you can write exactly that. Nothing stops you, and it will be as clear and reliable as any use of Set>>collect:, which is to say NOT VERY CLEAR OR RELIABLE AT ALL. (2) #collect:then{Do:Select:Reject:} had no other purpose than to avoid creating an intermediate and otherwise useless collection. If you are not trying to involve an intermediate set then it just plain wrong to use #collect:thenDo:. (3) Oddly enough, the reason that #collect:thenDo: exists in my library is that I copied it from Squeak, at a time when it had the same definition as Pharo and ST/X. Had I known of the change in Squeak I would have bitterly opposed it. The comment in the current Squeak code, that it is for readability, is 100% the reverse of the truth. Using the version with parentheses is WAY clearer than using the portmanteau method. Sigh. I see they broke #collect:thenSelect: in the same way. (4) Let me offer you another member of the #collect:then* family. Collection collect: collectBlock thenInject: initial into: injectBlock |r| r := initial. self do: [:each | r := injectBlock value: r value: (collectBlock value: each)]. ^r #(now is the hour for us to say goodbye) asSet collect: [:each | each size] thenInject: 0 into: [:acc :each | acc + each] => 29 (#(now is the hour for us to say goodbye) asSet collect: [:each | each size]) inject: 0 into: [:acc :each | acc + each] => 16 That is, it would be WRONG to implement #collect:thenInject:into: as #collect: followed by #inject:into:. The point is NOT to coalesce things that the receiver might (in general, incorrectly) regard as equal. (5) The bottom line is that if #collect:thenDo: and its relatives did not have their present semantics in Pharo (and ST/X), they would need to be reinvented, with names that were not as clear. (1) Just to repeat for emphasis, if you *want* (_ collect: _) do: _ then that is exactly what you should write. There is no excuse for using #collect:thenDo: in that case. It is NOT clearer to do so. And you should imagine me jumping up and down screaming that sending #collect: to a set is a bad code smell which demands very explicit documentation as to why you (for example) want a Set to answer a Set but an IdentitySet to also answer a Set, not the expected IdentitySet. (I have been bitten by that more than once.) On Mon, 9 Sep 2019 at 01:33, Herby Vojčík <[hidden email]> wrote: On 8. 9. 2019 14:28, Peter Kenny wrote: |
On Mon, Sep 09, 2019 at 05:46:48PM +1200, Richard O'Keefe wrote:
> > (3) Oddly enough, the reason that #collect:thenDo: exists in my > library is that I copied it from Squeak, at a time when it had > the same definition as Pharo and ST/X. Had I known of the change > in Squeak I would have bitterly opposed it. The comment in the > current Squeak code, that it is for readability, is 100% the > reverse of the truth. Using the version with parentheses is WAY > clearer than using the portmanteau method. Sigh. I see they > broke #collect:thenSelect: in the same way. > In Squeak, collect:thenDo: is implemented only in Collection. The method stamp is dgd 9/13/2004 23:42, which hardly qualifies as a recent change. And the method stamp for collect:thenSelect: is sma 5/12/2000. Dave |
In reply to this post by Richard O'Keefe
I think this thread indicates that the selector #collect:thenDo: may
indeed cause some missunderstanding given standard semantics of #collect:. Is there any reason not to use a different one, such as #map:thenDo:, to make the difference clearer? A nice side effect could be a new automatic refactoring rule that consolidates chained #collect: and #do: sends into a faster #collect:thenDo: send without breaking existing code. Am .09.2019, 07:46 Uhr, schrieb Richard O'Keefe <[hidden email]>: > (1) If you want (aSet collect: collectBlock) do: doBlock > you can write exactly that. Nothing stops you, and it will > be as clear and reliable as any use of Set>>collect:, which > is to say NOT VERY CLEAR OR RELIABLE AT ALL. > > (2) #collect:then{Do:Select:Reject:} had no other purpose than > to avoid creating an intermediate and otherwise useless > collection. If you are not trying to involve an intermediate > set then it just plain wrong to use #collect:thenDo:. > > (3) Oddly enough, the reason that #collect:thenDo: exists in my > library is that I copied it from Squeak, at a time when it had > the same definition as Pharo and ST/X. Had I known of the change > in Squeak I would have bitterly opposed it. The comment in the > current Squeak code, that it is for readability, is 100% the > reverse of the truth. Using the version with parentheses is WAY > clearer than using the portmanteau method. Sigh. I see they > broke #collect:thenSelect: in the same way. > > (4) Let me offer you another member of the #collect:then* family. > > Collection > collect: collectBlock thenInject: initial into: injectBlock > |r| > r := initial. > self do: [:each | > r := injectBlock value: r value: (collectBlock value: each)]. > ^r > > #(now is the hour for us to say goodbye) asSet > collect: [:each | each size] > thenInject: 0 into: [:acc :each | acc + each] > => 29 > (#(now is the hour for us to say goodbye) asSet > collect: [:each | each size]) > inject: 0 into: [:acc :each | acc + each] > => 16 > > That is, it would be WRONG to implement #collect:thenInject:into: > as #collect: followed by #inject:into:. The point is NOT to > coalesce things that the receiver might (in general, incorrectly) > regard as equal. > > (5) The bottom line is that if #collect:thenDo: and its relatives did > not have their present semantics in Pharo (and ST/X), they would > need to be reinvented, with names that were not as clear. > > (1) Just to repeat for emphasis, if you *want* (_ collect: _) do: _ > then that is exactly what you should write. There is no > excuse for using #collect:thenDo: in that case. It is NOT > clearer to do so. And you should imagine me jumping up and > down screaming that sending #collect: to a set is a bad code > smell which demands very explicit documentation as to why you > (for example) want a Set to answer a Set but an IdentitySet > to also answer a Set, not the expected IdentitySet. (I have > been bitten by that more than once.) > > > > On Mon, 9 Sep 2019 at 01:33, Herby Vojčík <[hidden email]> wrote: > >> On 8. 9. 2019 14:28, Peter Kenny wrote: >> > Two comments: >> > First, the method comment for Collection>>collect:thenDo: is "Utility >> method >> > to improve readability", which is exactly the same as for >> > collect:thenSelect: and collect:thenReject:. This suggests that the >> > *intention* of the method is not to introduce new behaviour, but >> simply >> to >> > provide a shorthand for the version with parentheses. For other kinds >> of >> >> I had that same impression. >> >> > collection this is true; just the deduping makes Set different. If we >> want >> >> I would be more defensive here and say that generic collection should >> have (collect:)do: implementation and only sequenceable collections have >> the optimized one (if it indeed is the case that it is a shotrhand for >> parenthesized one). >> >> > the different behaviour, this should be indicated by method name and >> > comment. >> > Second, if we remove asSet from the second snippet, the output is >> exactly >> > the same. It will be the same as long as the original collection has >> no >> > duplicates. Somehow the effect is to ignore the asSet. It just smells >> wrong. >> > >> > Peter Kenny >> >> Herby >> >> > Kasper Osterbye wrote >> >> The first version: >> >> >> >> (#(1 2 3) asSet collect: #odd) >> >> do: [ :each | Transcript show: each; cr ] >> >> >> >> is rather straight forward I believe, as collect: and do: has been >> around >> >> forever (relatively speaking). >> >> >> >> >> >> #(1 2 3) asSet collect: #odd >> >> thenDo: [ :each | Transcript show: each; cr ] >> >> >> >> >> >> On 8 September 2019 at 09.13.36, Richard Sargent ( >> > >> >> richard.sargent@ >> > >> >> ) wrote: >> >> >> >> I am skeptical of one that relies on a specific implementation >> rather >> >> than >> >> a specific definition. >> >> >> >> I share your feeling. I am not sure where such a definition would >> come >> >> from. In Squeak it is defined as: >> >> >> >> collect: collectBlock thenDo: doBlock >> >> >> >> ^ (self collect: collectBlock) do: doBlock >> >> >> >> In pharo as: >> >> >> >> collect: collectBlock thenDo: doBlock >> >> >> >> ^ self do: [ :each | doBlock value: (collectBlock value: each)] >> >> >> >> I might have called the method collect:eachDo:, but we each have >> slightly >> >> different styles. What I like about the pharo version is that is a >> >> shorthand for something which is not achieved by mere parentheses. >> >> >> >> Best, >> >> >> >> Kasper >> > >> > >> > >> > >> > >> > -- >> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html >> > >> >> >> |
In reply to this post by David T. Lewis
Digging a bit further, the change from the pointless implementation to the current implementation was made between Pharo 2.0 and Pharo 3.0 and the current definition was introduced into ST/X some time between 2010 and 2012, which is about the same time as Pharo. {collect,select,reject}:thenDo: were added to my library on 2004.09.14, exactly a day after the version now in Squeak. Which leads me to believe that I picked the method up from a previous version of Squeak and that some of the Squeak version history has been lost. By the way, it's not just Sets where the current Squeak definition is a pain in the posterior. Consider the following example. |o| o := OrderedCollection new. #[3 1 4 1 5 9] collect: [:byte | byte asFloat] thenDo: [:float | o addLast: float]. Transcript print: o; cr. This works fine in Pharo, in Smalltalk/X, and in my library. It raises an exception in Squeak. Or try #[3 1 4 1 5 9] collect: [:byte | byte odd ifTrue: [byte printString] ifFalse: [byte]] thenSelect: [:each | each isNumber] In Pharo and my library, it straightforwardly answers #[4]. In Squeak, it raises an exception. So that's sets where the current Squeak definition can give you trouble, and ByteArrays, and we are not done yet! Pretty much anything other than an Array or an OrderedCollection is going to get you in trouble in Squeak. Just think of the fun you can have with SortedCollection. Ooh, the list goes on and on. So, the Squeak definition * does not improve readability. In fact, it REDUCES readability because the human reader stops to ask "why is this portmanteau method used rather than the obvious code" and several minutes later discovers that there is NO reason. * is useless with most collection classes. * does nothing to improve performance but just adds overhead. Are we done yet? On Tue, 10 Sep 2019 at 01:27, David T. Lewis <[hidden email]> wrote: On Mon, Sep 09, 2019 at 05:46:48PM +1200, Richard O'Keefe wrote: |
In reply to this post by Steffen Märcker
An automatic refactoring rule that converts (a collect: b) do: c to (a collect: b thenDo: c) would change the semantics of the code in all of Squeak, Pharo, and Smalltalk/X, whichever definition of #collect:thenDo: is taken, and in Squeak would make the code slower, not faster. Using a different name *would* reduce confusion. However, there is an alternative. My library copied the #virtual{Collect,Select,Reject}: interface from Strongtalk. As the comment puts it, The idea is to do #select:, #reject:, #collect:, and some other collection->collection operations lazily. As long as you just use #do: &c, the operations will be referred back to the original collection, so that there is less need for fusion methods like #select:thenCollect:. ... You can do (((( coll virtualSelect: [..]) virtualCollect: [...]) virtualSelect: [..]) virtualCollect: [...]) asCollection and only *one* new collection is actually built. So (aCollection virtualCollect: transformBlock) do: actionBlock *does* build an intermediate object (with two instance variables) but *doesn't* build a whole intermediate collection. And that combination *could* be fused by an automatic rule without breaking anything. Does that sound like a way forward? I could convert my implementation of this interface to Pharo if people would like me to. On Tue, 10 Sep 2019 at 01:51, Steffen Märcker <[hidden email]> wrote: I think this thread indicates that the selector #collect:thenDo: may |
On 10 September 2019 at 00.56.28, Richard O'Keefe ([hidden email]) wrote:
Hi Richard, It seems like what you propose (in sofar as I understand it) is already present in at least three other libraries which have not yet made it to the core of Pharo: The Iterator framework Julien is working on: https://github.com/juliendelplanque/Iterators The XStreams framework: https://code.google.com/archive/p/xtreams/ The transducers: https://github.com/Pharophile/Transducers I was not aware of the virtual collections from Strongtalk. I am aware you know them (as you commented on the Iterators Julien proposed). I have been working with the LINQ stream/iterator framework in C#, and found that to be a rather sound library, lazy evaluation. The streams in Java are also nice, and are actually very good at exploiting multi cores. It is sad there is no lazy streams in the core of Pharo, and I would find it intersting to participate in a project to build one. Best, Kasper |
Hi,
I really think we should have an efficient way to chain operations on collections and alike. In my personal opinion, this should be backed by transducers, as they decouple the operations from the type the data source and sink. Hence they are applicable to different collection types, streams and so on. Also, transducers allow for straight-forward parallization of operations that are concurrency-safe. However, the transducers port is not complete yet. I think I'll finish it in October/November, but definitely this year. Maybe this could be a starting point to discuss how to move forward? One related question: Is there a somewhat comprehensive benchmark yielding a baseline of Pharo's collection/streams/iteration performance? I'd really like to compare the existing implemenation to the alternative approaches. Best regards, Steffen Am .09.2019, 11:30 Uhr, schrieb Kasper Østerbye <[hidden email]>: > On 10 September 2019 at 00.56.28, Richard O'Keefe ([hidden email]) > wrote: > > Does that sound like a way forward? > I could convert my implementation of this interface to > Pharo if people would like me to. > > Hi Richard, > > It seems like what you propose (in sofar as I understand it) is already > present in at least three other libraries which have not yet made it to > the > core of Pharo: > > The Iterator framework Julien is working on: > https://github.com/juliendelplanque/Iterators > > The XStreams framework: https://code.google.com/archive/p/xtreams/ > <https://code.google.com/archive/p/xtreams/> > > The transducers: https://github.com/Pharophile/Transducers > <https://github.com/Pharophile/Transducers> > > I was not aware of the virtual collections from Strongtalk. > > I am aware you know them (as you commented on the Iterators Julien > proposed). > > I have been working with the LINQ stream/iterator framework in C#, and > found that to be a rather sound library, lazy evaluation. The streams in > Java are also nice, and are actually very good at exploiting multi cores. > > It is sad there is no lazy streams in the core of Pharo, and I would find > it intersting to participate in a project to build one. > > Best, > > Kasper |
Free forum by Nabble | Edit this page |