A new version of Collections was added to project The Inbox:
http://source.squeak.org/inbox/Collections-mt.938.mcz ==================== Summary ==================== Name: Collections-mt.938 Author: mt Time: 14 April 2021, 2:38:21.265628 pm UUID: d109bde4-42a6-e840-803c-e4e80b98a74d Ancestors: Collections-mt.937 Proposal. Add a #joinSeparatedBy: that does not end up as a string. The current alternative would be #gather (#(1 2 3 4 5) gather: [:each | {each . $x}]) allButLast. Not sure whether a non-sequenceable version in Collection makes sense. We have had that discussion about #join etc. being not useful for uncertain output order. Some benchmarks: [ ((1 to: 1000) gather: [:num | {num . $x}]) allButLast ] bench. '22,000 per second. 45.5 microseconds per run. 13.09738 % GC time.' [ (1 to: 1000) collect: [:num | num] separatedBy: [$x] ] bench. '26,000 per second. 38.5 microseconds per run. 1.73965 % GC time.' =============== Diff against Collections-mt.937 =============== Item was added: + ----- Method: Collection>>collect:separatedBy: (in category 'enumerating') ----- + collect: elementBlock separatedBy: separatorBlock + "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection." + + | newCollection | + newCollection := self species new. + self + do: [:each | newCollection add: (elementBlock value: each)] + separatedBy: [:each | newCollection add: separatorBlock value]. + ^ newCollection! Item was added: + ----- Method: SequenceableCollection>>collect:separatedBy: (in category 'enumerating') ----- + collect: elementBlock separatedBy: separatorBlock + "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection." + + | newCollection | + newCollection := self species new: self size *2 -1. + 1 to: self size do: [:index | + index = 1 ifFalse: [newCollection at: (index-1 *2) put: separatorBlock value]. + newCollection at: (index-1 *2)+1 put: (elementBlock value: (self at: index))]. + ^ newCollection! |
Hi Marcel,
Due to the flaw of OrderedCollection >> #new:, the code won't work with OrderedCollections (and its subclasses). This is the same issue Tim found with OrderedCollection and #streamContents: the other day. The current workaround is to use #ofSize: instead of #new: to create a non-empty instance with the given capacity. It would be great to change OrderedCollection class >> #new: to return a non-empty collection, and create a new method, e.g. #newWithCapacity: to create an empty collection with the given capacity, but I don't think it's possible to do that without breaking almost all existing programs out there. The code won't work with LinkedList either because of the same reason. So, that class will need its own implementation. The code also suffers from #collect:'s problem, which is its reliance on #species. #species is used for way too many things, and a single method cannot fit all those roles. The fix for that is #collect:as:, which lets the user decide what the class of the result should be. So, the question is: should there be #collect:as:separatedBy: too? Levente On Wed, 14 Apr 2021, [hidden email] wrote: > A new version of Collections was added to project The Inbox: > http://source.squeak.org/inbox/Collections-mt.938.mcz > > ==================== Summary ==================== > > Name: Collections-mt.938 > Author: mt > Time: 14 April 2021, 2:38:21.265628 pm > UUID: d109bde4-42a6-e840-803c-e4e80b98a74d > Ancestors: Collections-mt.937 > > Proposal. Add a #joinSeparatedBy: that does not end up as a string. > > The current alternative would be #gather > (#(1 2 3 4 5) gather: [:each | {each . $x}]) allButLast. > > Not sure whether a non-sequenceable version in Collection makes sense. We have had that discussion about #join etc. being not useful for uncertain output order. > > Some benchmarks: > > [ ((1 to: 1000) gather: [:num | {num . $x}]) allButLast ] bench. > '22,000 per second. 45.5 microseconds per run. 13.09738 % GC time.' > > [ (1 to: 1000) collect: [:num | num] separatedBy: [$x] ] bench. > '26,000 per second. 38.5 microseconds per run. 1.73965 % GC time.' > > =============== Diff against Collections-mt.937 =============== > > Item was added: > + ----- Method: Collection>>collect:separatedBy: (in category 'enumerating') ----- > + collect: elementBlock separatedBy: separatorBlock > + "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection." > + > + | newCollection | > + newCollection := self species new. > + self > + do: [:each | newCollection add: (elementBlock value: each)] > + separatedBy: [:each | newCollection add: separatorBlock value]. > + ^ newCollection! > > Item was added: > + ----- Method: SequenceableCollection>>collect:separatedBy: (in category 'enumerating') ----- > + collect: elementBlock separatedBy: separatorBlock > + "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection." > + > + | newCollection | > + newCollection := self species new: self size *2 -1. > + 1 to: self size do: [:index | > + index = 1 ifFalse: [newCollection at: (index-1 *2) put: separatorBlock value]. > + newCollection at: (index-1 *2)+1 put: (elementBlock value: (self at: index))]. > + ^ newCollection! |
Hi Levente. > It would be great to change OrderedCollection class >> #new: to return a > non-empty collection, and create a new method, e.g. #newWithCapacity: to > create an empty collection with the given capacity, but I don't think it's > possible to do that without breaking almost all existing programs out > there. I agree, given that most implementations of #new: cover array-like structures and thus expose the idea of "capacity" in client code. From personal experience, I tried, occasionally, to be clever ^_^ and pre-allocate some capacity in an OrderedCollection via #new:. Yet, I would not generalize this kind of anti-pattern. Maybe such a change would not be as severe as you fear. At least not for OrderedCollection. (Collection allSubclasses sorted: [:a :b | a name <= b name]) collect: [:each | each -> ([(each new: 10) size] ifError: [:m | m])] as: OrderedDictionary. Here are the results: 21x Error 27x 0 40x 10 1x 100 (Matrix) So, it looks like that your feeling is right. For Trunk, we could easily check whether the #new: cases cover the "0" or the "10". I suspect the latter because Array. For MethodDictionary, you added a #newForCapacity: in 2011. :-) > So, the question is: should there be #collect:as:separatedBy: too? Yeah, looks like it. I will add it to the inbox. Best, Marcel
|
It's worth remembering that that nextPutAl: etc code is only causing a problem in the 'special fast case' branch. The 'normal' case would be to iterate on the parameter and #add: to the recipient. OrderedCollection does *that* ok. I must admit to never having looked at LinkedList>#add: ... yup that would be fine too. The only collection hierarchy class I see which doesn't like #add: is Matrix, and I'm not utterly sure that it ought to be under Collection
tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Useful random insult:- People around her are at risk of second hand idiocy. |
That's only the case, because the 'normal' branch has been fixed back in
2009 (see Collections-nice.116) to use #grownBy: and therefore #ofSize: instead of #new: to create the new instance of the underlying collection. Yet, we still don't have a test case convering this unusual usage of #streamContents:. Levente On Thu, 15 Apr 2021, tim Rowledge wrote: > It's worth remembering that that nextPutAl: etc code is only causing a problem in the 'special fast case' branch. The 'normal' case would be to iterate on the parameter and #add: to the recipient. OrderedCollection does *that* ok. I must admit to never having looked at LinkedList>#add: ... yup that would be fine too. The only collection hierarchy class I see which doesn't like #add: is Matrix, and I'm not utterly sure that it ought to be under Collection > > tim > -- > tim Rowledge; [hidden email]; http://www.rowledge.org/tim > Useful random insult:- People around her are at risk of second hand idiocy. |
Free forum by Nabble | Edit this page |