The Inbox: Collections-mt.938.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Collections-mt.938.mcz

commits-2
A new version of Collections was added to project The Inbox:
http://source.squeak.org/inbox/Collections-mt.938.mcz

==================== Summary ====================

Name: Collections-mt.938
Author: mt
Time: 14 April 2021, 2:38:21.265628 pm
UUID: d109bde4-42a6-e840-803c-e4e80b98a74d
Ancestors: Collections-mt.937

Proposal. Add a #joinSeparatedBy: that does not end up as a string.

The current alternative would be #gather
(#(1 2 3 4 5) gather: [:each | {each . $x}]) allButLast.

Not sure whether a non-sequenceable version in Collection makes sense. We have had that discussion about #join etc. being not useful for uncertain output order.

Some benchmarks:

[ ((1 to: 1000) gather: [:num | {num . $x}]) allButLast ] bench.
'22,000 per second. 45.5 microseconds per run. 13.09738 % GC time.'

[ (1 to: 1000) collect: [:num | num] separatedBy: [$x] ] bench.
'26,000 per second. 38.5 microseconds per run. 1.73965 % GC time.'

=============== Diff against Collections-mt.937 ===============

Item was added:
+ ----- Method: Collection>>collect:separatedBy: (in category 'enumerating') -----
+ collect: elementBlock separatedBy: separatorBlock
+ "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection."
+
+ | newCollection |
+ newCollection := self species new.
+ self
+ do: [:each | newCollection add: (elementBlock value: each)]
+ separatedBy: [:each | newCollection add: separatorBlock value].
+ ^ newCollection!

Item was added:
+ ----- Method: SequenceableCollection>>collect:separatedBy: (in category 'enumerating') -----
+ collect: elementBlock separatedBy: separatorBlock
+ "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection."
+
+ | newCollection |
+ newCollection := self species new: self size *2 -1.
+ 1 to: self size do: [:index |
+ index = 1 ifFalse: [newCollection at: (index-1 *2) put: separatorBlock value].
+ newCollection at: (index-1 *2)+1 put: (elementBlock value: (self at: index))].
+ ^ newCollection!


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-mt.938.mcz

Levente Uzonyi
Hi Marcel,

Due to the flaw of OrderedCollection >> #new:, the code won't work with
OrderedCollections (and its subclasses).
This is the same issue Tim found with OrderedCollection and
#streamContents: the other day.
The current workaround is to use #ofSize: instead of #new: to create a
non-empty instance with the given capacity.
It would be great to change OrderedCollection class >> #new: to return a
non-empty collection, and create a new method, e.g. #newWithCapacity: to
create an empty collection with the given capacity, but I don't think it's
possible to do that without breaking almost all existing programs out
there.

The code won't work with LinkedList either because of the same reason. So,
that class will need its own implementation.

The code also suffers from #collect:'s problem, which is its reliance on
#species. #species is used for way too many things, and a single method
cannot fit all those roles.
The fix for that is #collect:as:, which lets the user decide what the
class of the result should be.
So, the question is: should there be #collect:as:separatedBy: too?


Levente

On Wed, 14 Apr 2021, [hidden email] wrote:

> A new version of Collections was added to project The Inbox:
> http://source.squeak.org/inbox/Collections-mt.938.mcz
>
> ==================== Summary ====================
>
> Name: Collections-mt.938
> Author: mt
> Time: 14 April 2021, 2:38:21.265628 pm
> UUID: d109bde4-42a6-e840-803c-e4e80b98a74d
> Ancestors: Collections-mt.937
>
> Proposal. Add a #joinSeparatedBy: that does not end up as a string.
>
> The current alternative would be #gather
> (#(1 2 3 4 5) gather: [:each | {each . $x}]) allButLast.
>
> Not sure whether a non-sequenceable version in Collection makes sense. We have had that discussion about #join etc. being not useful for uncertain output order.
>
> Some benchmarks:
>
> [ ((1 to: 1000) gather: [:num | {num . $x}]) allButLast ] bench.
> '22,000 per second. 45.5 microseconds per run. 13.09738 % GC time.'
>
> [ (1 to: 1000) collect: [:num | num] separatedBy: [$x] ] bench.
> '26,000 per second. 38.5 microseconds per run. 1.73965 % GC time.'
>
> =============== Diff against Collections-mt.937 ===============
>
> Item was added:
> + ----- Method: Collection>>collect:separatedBy: (in category 'enumerating') -----
> + collect: elementBlock separatedBy: separatorBlock
> + "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection."
> +
> + | newCollection |
> + newCollection := self species new.
> + self
> + do: [:each | newCollection add: (elementBlock value: each)]
> + separatedBy: [:each | newCollection add: separatorBlock value].
> + ^ newCollection!
>
> Item was added:
> + ----- Method: SequenceableCollection>>collect:separatedBy: (in category 'enumerating') -----
> + collect: elementBlock separatedBy: separatorBlock
> + "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection."
> +
> + | newCollection |
> + newCollection := self species new: self size *2 -1.
> + 1 to: self size do: [:index |
> + index = 1 ifFalse: [newCollection at: (index-1 *2) put: separatorBlock value].
> + newCollection at: (index-1 *2)+1 put: (elementBlock value: (self at: index))].
> + ^ newCollection!

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-mt.938.mcz

marcel.taeumel
Hi Levente.

> It would be great to change OrderedCollection class >> #new: to return a
> non-empty collection, and create a new method, e.g. #newWithCapacity: to
> create an empty collection with the given capacity, but I don't think it's
> possible to do that without breaking almost all existing programs out
> there.

I agree, given that most implementations of #new: cover array-like structures and thus expose the idea of "capacity" in client code. From personal experience, I tried, occasionally, to be clever ^_^ and pre-allocate some capacity in an OrderedCollection via #new:. Yet, I would not generalize this kind of anti-pattern. Maybe such a change would not be as severe as you fear. At least not for OrderedCollection.

(Collection allSubclasses
sorted: [:a :b | a name <= b name])
collect: [:each | each -> ([(each new: 10) size] ifError: [:m | m])]
as: OrderedDictionary.

Here are the results:

21x Error
27x 0
40x 10
1x 100 (Matrix)

So, it looks like that your feeling is right. For Trunk, we could easily check whether the #new: cases cover the "0" or the "10". I suspect the latter because Array.

For MethodDictionary, you added a #newForCapacity: in 2011. :-) 

So, the question is: should there be #collect:as:separatedBy: too?

Yeah, looks like it. I will add it to the inbox.

Best,
Marcel

Am 14.04.2021 19:17:55 schrieb Levente Uzonyi <[hidden email]>:

Hi Marcel,

Due to the flaw of OrderedCollection >> #new:, the code won't work with
OrderedCollections (and its subclasses).
This is the same issue Tim found with OrderedCollection and
#streamContents: the other day.
The current workaround is to use #ofSize: instead of #new: to create a
non-empty instance with the given capacity.
It would be great to change OrderedCollection class >> #new: to return a
non-empty collection, and create a new method, e.g. #newWithCapacity: to
create an empty collection with the given capacity, but I don't think it's
possible to do that without breaking almost all existing programs out
there.

The code won't work with LinkedList either because of the same reason. So,
that class will need its own implementation.

The code also suffers from #collect:'s problem, which is its reliance on
#species. #species is used for way too many things, and a single method
cannot fit all those roles.
The fix for that is #collect:as:, which lets the user decide what the
class of the result should be.
So, the question is: should there be #collect:as:separatedBy: too?


Levente

On Wed, 14 Apr 2021, [hidden email] wrote:

> A new version of Collections was added to project The Inbox:
> http://source.squeak.org/inbox/Collections-mt.938.mcz
>
> ==================== Summary ====================
>
> Name: Collections-mt.938
> Author: mt
> Time: 14 April 2021, 2:38:21.265628 pm
> UUID: d109bde4-42a6-e840-803c-e4e80b98a74d
> Ancestors: Collections-mt.937
>
> Proposal. Add a #joinSeparatedBy: that does not end up as a string.
>
> The current alternative would be #gather
> (#(1 2 3 4 5) gather: [:each | {each . $x}]) allButLast.
>
> Not sure whether a non-sequenceable version in Collection makes sense. We have had that discussion about #join etc. being not useful for uncertain output order.
>
> Some benchmarks:
>
> [ ((1 to: 1000) gather: [:num | {num . $x}]) allButLast ] bench.
> '22,000 per second. 45.5 microseconds per run. 13.09738 % GC time.'
>
> [ (1 to: 1000) collect: [:num | num] separatedBy: [$x] ] bench.
> '26,000 per second. 38.5 microseconds per run. 1.73965 % GC time.'
>
> =============== Diff against Collections-mt.937 ===============
>
> Item was added:
> + ----- Method: Collection>>collect:separatedBy: (in category 'enumerating') -----
> + collect: elementBlock separatedBy: separatorBlock
> + "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection."
> +
> + | newCollection |
> + newCollection := self species new.
> + self
> + do: [:each | newCollection add: (elementBlock value: each)]
> + separatedBy: [:each | newCollection add: separatorBlock value].
> + ^ newCollection!
>
> Item was added:
> + ----- Method: SequenceableCollection>>collect:separatedBy: (in category 'enumerating') -----
> + collect: elementBlock separatedBy: separatorBlock
> + "Evaluate the elementBlock for all elements in the receiver, and evaluate the separatorBlock between. Collect the resulting values of both blocks into a collection like the receiver. Answer the new collection."
> +
> + | newCollection |
> + newCollection := self species new: self size *2 -1.
> + 1 to: self size do: [:index |
> + index = 1 ifFalse: [newCollection at: (index-1 *2) put: separatorBlock value].
> + newCollection at: (index-1 *2)+1 put: (elementBlock value: (self at: index))].
> + ^ newCollection!



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-mt.938.mcz

timrowledge
It's worth remembering that that nextPutAl: etc code is only causing a problem in the 'special fast case' branch. The 'normal' case would be to iterate on the parameter and #add: to the recipient.  OrderedCollection does *that* ok. I must admit to never having looked at LinkedList>#add: ... yup that would be fine too. The only collection hierarchy class I see which doesn't like #add: is Matrix, and I'm not utterly sure that it ought to be under Collection

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- People around her are at risk of second hand idiocy.



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-mt.938.mcz

Levente Uzonyi
That's only the case, because the 'normal' branch has been fixed back in
2009 (see Collections-nice.116) to use #grownBy: and therefore #ofSize:
instead of #new: to create the new instance of the underlying collection.
Yet, we still don't have a test case convering this unusual usage of
#streamContents:.


Levente

On Thu, 15 Apr 2021, tim Rowledge wrote:

> It's worth remembering that that nextPutAl: etc code is only causing a problem in the 'special fast case' branch. The 'normal' case would be to iterate on the parameter and #add: to the recipient.  OrderedCollection does *that* ok. I must admit to never having looked at LinkedList>#add: ... yup that would be fine too. The only collection hierarchy class I see which doesn't like #add: is Matrix, and I'm not utterly sure that it ought to be under Collection
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Useful random insult:- People around her are at risk of second hand idiocy.