IdentitySet>>collect:

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet>>collect:

Bert Freudenberg
On 01.12.2014, at 07:35, Colin Putney <[hidden email]> wrote:

On Sun, Nov 30, 2014 at 6:20 AM, Bert Freudenberg <[hidden email]> wrote:
 
How would you define “type” here? Same class?

Yes, same class. It might be reasonable to answer instances of analogous classes for weak collections, and expect the caller to use #collect:as: if they want to retain weakness. 

Colin

That’s certainly simple:

 “collect: returns a collection of the same class as the receiver, except for weak collections, which answer non-weak equivalents. To return a collection of a different class, use collect:as:"

- Bert -




smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: AW: [squeak-dev] IdentitySet>>collect:

Eliot Miranda-2
In reply to this post by Ben Coman


On Fri, Nov 28, 2014 at 7:42 PM, Ben Coman <[hidden email]> wrote:
Levente Uzonyi wrote:
The problem with changing #species is that it's overused. There are at least three different things it's used for in the Collection hierarchy:
- the class of the object returned by #collect: (Set's subclasses are the exception since 2000 or so, except for WeakSet, but that's really strange, and should revisited)
- the class of the object returned by #select: and friends
- the class used for comparison of collections

So should there be #collectSpecies ?

No.  That's what species was invented for in the first place.
 
Which defaults to "^self species" and is overidden as required.

In the case of Sets, you start with an "unordered collection without duplicates", but since #collect can introduce duplicates, you end up with "an unordered collection with duplicates" --> Bag (per Dave's suggestion)

So I think its reasonable that the following two snippets should produce the same result.

oc := OrderedCollection newFrom: #( 1 2 3 4 5 ).
(oc collect: [ :e | e even ]) occurrencesOf:  true. "--> 2"

s := Set newFrom: #( 1 2 3 4 5 ).
(s collect: [ :e | e even ]) occurrencesOf:  true.  "--> 2"


which you get with the following change...

Set>>collectSpecies
        ^Bag

Set>>collect: aBlock
        | result |
        result := self collectSpecies new: self size.
        array do: [:each | each ifNotNil: [result add: (aBlock value: each enclosedSetElement)]].
        ^ result




HashedCollection >> #select: uses #species. Changing IdentitySet
#species to return a Set would break #select:, because a ~~ b doesn't
imply a ~= b. IdentitySet is the optimal class for the return value of IdentitySet >> #select:, because it's guaranteed to be able to hold all selected values. I think the same applies to all other kind of sets (and collections, because #select: is simply reducing the number of elements).

This is not the first time this discussion comes up. There was one in 2003[1], but the thread is hard to follow. And there was one in 2009[2].
I think the conclusion was that it's better to leave the method as it is, because there's no better way to do it.
Sure, we could use #species in #collect:, but it would break quite a lot of stuff. For example:

| k |
k := KeyedSet keyBlock: [ :each | each first ].
k add: #[1]; add: #[2].
k collect: [ :each | each size ]


That works with the above change to produce --> a Bag(1 1)


Currently this snippet returns "a Set(1)". Using #species in #collect: (via #copyEmpty) one would get an error, because SmallInteger does not understand #first. Changing the #species of KeyedSet would break #select:.

So IMHO when you #collect: from a set, you should always use #collect:as:. Here is a (partial) list of methods which send #collect: to a Set:

Categorizer >> #changeFromCategorySpecs:
ClassFactoryForTestCase >> #createdClassNames
Dictionary >> #unreferencedKeys
MCDependencySorterTest >> #assertItems:orderAs:withRequired:toLoad:extraProvisions:
MessageNode >> #analyseTempsWithin:rootNode:assignmentPools:
MethodNode >> #referencedValuesWithinBlockExtent:
SetTest >> #testCollect
SetWithNilTest >> #runSetWithNilTestOf:

There are probably many more, but we should fix all of them. A static analyzer could help a lot.

Levente

[1] http://lists.squeakfoundation.org/pipermail/squeak-dev/2003-February/052659.html
[2] http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-November/141016.html



On Wed, 26 Nov 2014, David T. Lewis wrote:

It seems to me that Object>>species is intended to handle this sort of issue.

For IdentitySet, it answers what Eliot was expecting:

  IdentitySet new species ==> IdentitySet
  Set new species ==> Set

However, IdentitySet>>collect: does not make use of this, and it answers
a Set for the reasons that Levente explained.

Answering an Array or and OrderedCollection would not really make sense,
because sets are unordered collections (but Bag might be better).

Bag is better.


Shouldn't we have an implementation of IdentitySet>>species that answers
Set (or Bag), with a method comment explaining why this is the case, and
with all of the collection methods using #species to answer the right
kind of result?

I note that IdentitySet>>collect: answers a Set, but IdentitySet>select:
sends #species and therefore answers an IdentitySet.

So I think that if we want the #species of an IdentitySet to be a Set,
then we should make it so, and give it a method comment to explain the
rationale. And the #collect: and #select: methods should both answer a
result of that #species.

I think the species of #collect: and #select: are intrinsically different. #collect: is a transformation.


Dave


On Thu, Nov 27, 2014 at 01:14:30AM +0100, Levente Uzonyi wrote:
Your example hides the problem of ordering - what Tobias is asking about -
so here's another:

(IdentitySet withAll: #(1 1.0)) collect: [ :each | each class ]

If IdentitySet >> #collect: were returning an Array, then what would be the
answer?

{ SmallInteger. Float } or { Float. SmallInteger } ?

-->  a Bag(SmallInteger Float)


If you really want to have the resulting collection have the same size,
but avoid the problem with ordering, then what you really need is a Bag.

To me that makes the most sense.


On Thu, 27 Nov 2014, Frank Lesser wrote:

Hi Tobias,
agree, a problem of "OrderedCollection"
not to break a lot of other things we could return an Array.
but for me collecting has priority.
Frank

-----Urspr?ngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von
Tobias
Pape
Gesendet: Donnerstag, 27. November 2014 00:48
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] IdentitySet>>collect:


On 27.11.2014, at 00:34, Frank Lesser <[hidden email]>
wrote:

hmm, not convinced

(IdentitySet withAll: #(1 1.0)) collect: [:e| e asInteger ]
OrderedCollection(1 1 )

in LSWVST ( one-to-one ), you collect results of evaluating a block on
objects.

Frank
maybe I am wrong ...

Where would the order come from for that _Ordered_Collection?

An unordered OrderedCollection --> Bag

cheers -ben




--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet>>collect:

Eliot Miranda-2
In reply to this post by Bert Freudenberg


On Mon, Dec 1, 2014 at 2:35 AM, Bert Freudenberg <[hidden email]> wrote:
On 01.12.2014, at 07:35, Colin Putney <[hidden email]> wrote:

On Sun, Nov 30, 2014 at 6:20 AM, Bert Freudenberg <[hidden email]> wrote:
 
How would you define “type” here? Same class?

Yes, same class. It might be reasonable to answer instances of analogous classes for weak collections, and expect the caller to use #collect:as: if they want to retain weakness. 

Colin

That’s certainly simple:

 “collect: returns a collection of the same class as the receiver, except for weak collections, which answer non-weak equivalents. To return a collection of a different class, use collect:as:"

except collect: can't possibly do this for Interval.  IMO, reify the idea of species in the comments for collect: and species and say that collect: answers something of the same species.  The idea being that species is the same as class when the receiver is a mutable collection, but some other suitable class when it doesn't. 



--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: IdentitySet>>collect:

Chris Muller-3
On Mon, Dec 1, 2014 at 10:29 AM, Eliot Miranda <[hidden email]> wrote:

>
>
> On Mon, Dec 1, 2014 at 2:35 AM, Bert Freudenberg <[hidden email]>
> wrote:
>>
>> On 01.12.2014, at 07:35, Colin Putney <[hidden email]> wrote:
>>
>>
>> On Sun, Nov 30, 2014 at 6:20 AM, Bert Freudenberg <[hidden email]>
>> wrote:
>>
>>>
>>> How would you define “type” here? Same class?
>>
>>
>> Yes, same class. It might be reasonable to answer instances of analogous
>> classes for weak collections, and expect the caller to use #collect:as: if
>> they want to retain weakness.
>>
>> Colin
>>
>>
>> That’s certainly simple:
>>
>>  “collect: returns a collection of the same class as the receiver, except
>> for weak collections, which answer non-weak equivalents. To return a
>> collection of a different class, use collect:as:"
>
>
> except collect: can't possibly do this for Interval.  IMO, reify the idea of

Not only that, but the "except for weak collections" is a totally
arbitrary exception, because the collected objects may or may not be
referenced from elsewhere.

> species in the comments for collect: and species and say that collect:
> answers something of the same species.  The idea being that species is the
> same as class when the receiver is a mutable collection, but some other
> suitable class when it doesn't.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Fwd: [squeak-dev] IdentitySet>>collect:

Eliot Miranda-2
In reply to this post by Bert Freudenberg


On Tue, Dec 2, 2014 at 1:20 AM, stepharo <[hidden email]> wrote:
What is the old behavior?

These are from Smalltalk-80 v2 of

Object methods for private
species
"Answer the preferred class for reconstructing the receiver.  For example, 
collections create new collections whenever enumeration messages such as 
collect: or select: are invoked.  The new kind of collection is determined by 
the species of the original collection.  Species and class are not always the 
same.  For example, the species of Interval is Array."

^self class

Collection methods for enumerating
collect: aBlock 
"Evaluate aBlock with each of the receiver's elements as the argument.  Collect the 
resulting values into a collection that is like the receiver.  Answer the new 
collection. "

| newCollection |
newCollection _ self species new.
self do: [:each | newCollection add: (aBlock value: each)].
^newCollection

select: aBlock 
"Evaluate aBlock with each of the receiver's elements as the argument. 
Collect into a new collection like the receiver, only those elements for which
aBlock evaluates to true.  Answer the new collection."

| newCollection |
newCollection _ self species new.
self do: [:each | (aBlock value: each) ifTrue: [newCollection add: each]].
^newCollection

Interval methods for private
species
^Array

But there are also already uses for comparing:

CharacterBlock methods for comparing
= aCharacterBlock 
self species = aCharacterBlock species
ifTrue: [^stringIndex = aCharacterBlock stringIndex]
ifFalse: [^false]

Interval methods for comparing
= anInterval 
"Answer true if my species and anInterval species are equal, and
if our starts, steps and sizes are equal."

self species == anInterval species
ifTrue: [^start = anInterval first
and: [step = anInterval increment and: [self size = anInterval size]]]
ifFalse: [^false]


However, the species comment clearly states the usage pattern, even though it fails to mention the intent (to provide a mutable container with which to derive a new collection from the original).

In addition it would be good to have tests for such core library.

+1
 

Stef

Le 1/12/14 11:44, Sven Van Caekenberghe a écrit :


Begin forwarded message:

From: Bert Freudenberg <[hidden email]>
Subject: Re: [squeak-dev] IdentitySet>>collect:
Date: 1 Dec 2014 11:35:00 CET
To: The general-purpose Squeak developers list <[hidden email]>
Reply-To: The general-purpose Squeak developers list <[hidden email]>

On 01.12.2014, at 07:35, Colin Putney <[hidden email]> wrote:

On Sun, Nov 30, 2014 at 6:20 AM, Bert Freudenberg <[hidden email]> wrote:
 
How would you define “type” here? Same class?

Yes, same class. It might be reasonable to answer instances of analogous classes for weak collections, and expect the caller to use #collect:as: if they want to retain weakness. 

Colin

That’s certainly simple:

 “collect: returns a collection of the same class as the receiver, except for weak collections, which answer non-weak equivalents. To return a collection of a different class, use collect:as:"

- Bert -

The above definition makes perfect sense to me. The default should be simple and predictable, the more complex and special cases can be handled by a more convoluted API.

And we should change callers that depend on the old behaviour.

Sven







--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Fwd: [squeak-dev] IdentitySet>>collect:

Eliot Miranda-2
In reply to this post by Bert Freudenberg


On Tue, Dec 2, 2014 at 1:20 AM, stepharo <[hidden email]> wrote:
What is the old behavior?

These are from Smalltalk-80 v2 of April 1, 1983 on 31 May 1983 at 9:10:35 am

Object methods for private
species
"Answer the preferred class for reconstructing the receiver.  For example, 
collections create new collections whenever enumeration messages such as 
collect: or select: are invoked.  The new kind of collection is determined by 
the species of the original collection.  Species and class are not always the 
same.  For example, the species of Interval is Array."

^self class

Collection methods for enumerating
collect: aBlock 
"Evaluate aBlock with each of the receiver's elements as the argument.  Collect the 
resulting values into a collection that is like the receiver.  Answer the new 
collection. "

| newCollection |
newCollection _ self species new.
self do: [:each | newCollection add: (aBlock value: each)].
^newCollection

select: aBlock 
"Evaluate aBlock with each of the receiver's elements as the argument. 
Collect into a new collection like the receiver, only those elements for which
aBlock evaluates to true.  Answer the new collection."

| newCollection |
newCollection _ self species new.
self do: [:each | (aBlock value: each) ifTrue: [newCollection add: each]].
^newCollection

Interval methods for private
species
^Array

But there are also already uses for comparing:

CharacterBlock methods for comparing
= aCharacterBlock 
self species = aCharacterBlock species
ifTrue: [^stringIndex = aCharacterBlock stringIndex]
ifFalse: [^false]

Interval methods for comparing
= anInterval 
"Answer true if my species and anInterval species are equal, and
if our starts, steps and sizes are equal."

self species == anInterval species
ifTrue: [^start = anInterval first
and: [step = anInterval increment and: [self size = anInterval size]]]
ifFalse: [^false]


However, the species comment clearly states the usage pattern, even though it fails to mention the intent (to provide a mutable container with which to derive a new collection from the original).
 
In addition it would be good to have tests for such core library.

+ 1

Stef

Le 1/12/14 11:44, Sven Van Caekenberghe a écrit :


Begin forwarded message:

From: Bert Freudenberg <[hidden email]>
Subject: Re: [squeak-dev] IdentitySet>>collect:
Date: 1 Dec 2014 11:35:00 CET
To: The general-purpose Squeak developers list <[hidden email]>
Reply-To: The general-purpose Squeak developers list <[hidden email]>

On 01.12.2014, at 07:35, Colin Putney <[hidden email]> wrote:

On Sun, Nov 30, 2014 at 6:20 AM, Bert Freudenberg <[hidden email]> wrote:
 
How would you define “type” here? Same class?

Yes, same class. It might be reasonable to answer instances of analogous classes for weak collections, and expect the caller to use #collect:as: if they want to retain weakness. 

Colin

That’s certainly simple:

 “collect: returns a collection of the same class as the receiver, except for weak collections, which answer non-weak equivalents. To return a collection of a different class, use collect:as:"

- Bert -

The above definition makes perfect sense to me. The default should be simple and predictable, the more complex and special cases can be handled by a more convoluted API.

And we should change callers that depend on the old behaviour.

Sven







--
best,
Eliot


123