Why is Heap>>#species => Array?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Why is Heap>>#species => Array?

Klaus D. Witzel
Subject line says it all, check yourself,

(Heap withAll: 'array') reject: [:x | x = $r]

What's the rationale (there's no doc, no comment)? Archive shows that  
#species was changed to fix another (anonymous) bug but, this way the  
senders of #species can impossibly do what Smalltalk users expect from the  
collection hierarchy (and there is #asArray ...)

-  
http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/011043.html

/Klaus

P.S. Heap is used in not-so-uncritical parts, like #startTimerEventLoop  
and WorldState, so I don't want to "just" play with Heap's #species  
without any background info on what was fixed.


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Nicolas Cellier-3
Klaus D. Witzel a écrit :
> Subject line says it all, check yourself,
>
> (Heap withAll: 'array') reject: [:x | x = $r]
>
> What's the rationale (there's no doc, no comment)? Archive shows that
> #species was changed to fix another (anonymous) bug but, this way the
> senders of #species can impossibly do what Smalltalk users expect from
> the collection hierarchy (and there is #asArray ...)
>

I have not the background to answer you.
All I can tell is that this triggers another associativity failure:


        | anArray heap1 heap2 |
        anArray := #(1 2 3).
        heap1 := Heap withAll: (1 to: 3) sortBlock: [:a :b | a < b].
        heap2 := Heap withAll: (1 to: 3) sortBlock: [:a :b | b > a].
        (heap1 = anArray) & (heap2 = anArray) ==> (heap1 = heap2)

Heaps do compare their sortBlock.
They are more relax when comparing with their species.

This is minor, but note nice.

Nicolas

> -
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/011043.html 
>
>
> /Klaus
>
> P.S. Heap is used in not-so-uncritical parts, like #startTimerEventLoop
> and WorldState, so I don't want to "just" play with Heap's #species
> without any background info on what was fixed.
>
>
>


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Nicolas Cellier-3
nicolas cellier a écrit :

>
> I have not the background to answer you.
> All I can tell is that this triggers another associativity failure:
>
>
>     | anArray heap1 heap2 |
>     anArray := #(1 2 3).
>     heap1 := Heap withAll: (1 to: 3) sortBlock: [:a :b | a < b].
>     heap2 := Heap withAll: (1 to: 3) sortBlock: [:a :b | b > a].
>     (heap1 = anArray) & (heap2 = anArray) ==> (heap1 = heap2)
>

opened http://bugs.squeak.org/view.php?id=6943 for this one


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Klaus D. Witzel
In reply to this post by Nicolas Cellier-3
On Thu, 21 Feb 2008 22:06:28 +0100, nicolas cellier wrote:

> Klaus D. Witzel a écrit :
>> Subject line says it all, check yourself,
>>  (Heap withAll: 'array') reject: [:x | x = $r]
>>  What's the rationale (there's no doc, no comment)? Archive shows that  
>> #species was changed to fix another (anonymous) bug but, this way the  
>> senders of #species can impossibly do what Smalltalk users expect from  
>> the collection hierarchy (and there is #asArray ...)
>>
>
> I have not the background to answer you.
> All I can tell is that this triggers another associativity failure:
>
>
> | anArray heap1 heap2 |
> anArray := #(1 2 3).
> heap1 := Heap withAll: (1 to: 3) sortBlock: [:a :b | a < b].
> heap2 := Heap withAll: (1 to: 3) sortBlock: [:a :b | b > a].
> (heap1 = anArray) & (heap2 = anArray) ==> (heap1 = heap2)
>
> Heaps do compare their sortBlock.

And no Array compares to a Heap, heap invariantly supers #= to  
SequenceableCollection.

> They are more relax when comparing with their species.

IC what you mean but, #species is invariant in their #= and their super #=.

> This is minor, but note nice.

It breaks all users of #species (#collect:, #reject:, #inject:, etc, all  
the powerful things in collection hierarchy); let's not call it minor then  
:)

> Nicolas
>
>> -  
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/011043.html 
>>   /Klaus
>>  P.S. Heap is used in not-so-uncritical parts, like  
>> #startTimerEventLoop and WorldState, so I don't want to "just" play  
>> with Heap's #species without any background info on what was fixed.
>>
>
>
>



Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Nicolas Cellier-3
In reply to this post by Klaus D. Witzel
Klaus D. Witzel a écrit :
> Subject line says it all, check yourself,
>
> (Heap withAll: 'array') reject: [:x | x = $r]
>
> What's the rationale (there's no doc, no comment)? Archive shows that
> #species was changed to fix another (anonymous) bug but, this way the
> senders of #species can impossibly do what Smalltalk users expect from
> the collection hierarchy (and there is #asArray ...)
>


I think this is mainly because of collect:

Since collect: will potentially change class of elements, it would make
sortBlock: fail.


        | heap |
        heap := Heap withAll: (1 to: 10).
        self shouldnt: [heap collect: [:e | e even]] raise: Error.

Of course, another solution like adopted in SortedCollection would be to
define Heap>>collect: rather than Heap>>species.

But there might be other reasons...

Nicolas

> -
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/011043.html 
>
>
> /Klaus
>
> P.S. Heap is used in not-so-uncritical parts, like #startTimerEventLoop
> and WorldState, so I don't want to "just" play with Heap's #species
> without any background info on what was fixed.
>
>
>


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] self copyEmpty better than self species new [was: Why is Heap>>#species => Array?}

Klaus D. Witzel
In reply to this post by Nicolas Cellier-3
On Thu, 21 Feb 2008 22:19:43 +0100, nicolas cellier wrote:

...
>
> opened http://bugs.squeak.org/view.php?id=6943 for this one

Well, hm, I have as yet not thought about the subject line in that report.

But now I think that Heap #species problem is more subtle: (self species  
new) cannot pass the sortBlock of Heap (nor any other clone-invariant  
instVar) for users like #collect: and friends; therefore Array is passed  
by Heap #species.

But #copyEmpty can pass sortBlock and *whatsoever* to clones ;-) With  
#copyEmpty in place of (self species new), sending #species would be  
mainly in support of #= and #copyEmpty, throughout the collection  
hierarchy.

BTW hey Traits guys, did you discover something like the above during your  
reengineering of the collection hierarchy, that would be interesting to  
know !

>
>



Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Klaus D. Witzel
In reply to this post by Nicolas Cellier-3
On Thu, 21 Feb 2008 22:56:15 +0100, nicolas cellier<wrote:

> Klaus D. Witzel a écrit :
>> Subject line says it all, check yourself,
>>  (Heap withAll: 'array') reject: [:x | x = $r]
>>  What's the rationale (there's no doc, no comment)? Archive shows that  
>> #species was changed to fix another (anonymous) bug but, this way the  
>> senders of #species can impossibly do what Smalltalk users expect from  
>> the collection hierarchy (and there is #asArray ...)
>>
>
>
> I think this is mainly because of collect:

That's about the same that I wrote about in the other message :)

> Since collect: will potentially change class of elements, it would make  
> sortBlock: fail.
>
>
> | heap |
> heap := Heap withAll: (1 to: 10).
> self shouldnt: [heap collect: [:e | e even]] raise: Error.
>
> Of course, another solution like adopted in SortedCollection would be to  
> define Heap>>collect: rather than Heap>>species.
>
> But there might be other reasons...

I don't think so, the March 2000 bug fix post talked about enumeration  
problem, I think we have it now (reconstructed the cause :) Good work  
Nicolas !

> Nicolas
>
>> -  
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/011043.html 
>>   /Klaus
>>  P.S. Heap is used in not-so-uncritical parts, like  
>> #startTimerEventLoop and WorldState, so I don't want to "just" play  
>> with Heap's #species without any background info on what was fixed.
>>
>
>
>



Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Paolo Bonzini-2
In reply to this post by Klaus D. Witzel
Klaus D. Witzel wrote:
> Subject line says it all, check yourself,
>
> (Heap withAll: 'array') reject: [:x | x = $r]
>
> What's the rationale (there's no doc, no comment)? Archive shows that
> #species was changed to fix another (anonymous) bug but, this way the
> senders of #species can impossibly do what Smalltalk users expect from
> the collection hierarchy (and there is #asArray ...)

You answered yourself as to the solution, which is to use #copyEmpty
more even in Collection (especially in #select: and #reject:).  #species
should be used only for #collect: (and even then, it only works by
chance, i.e. because sorted collections' #collect: methods return
something for which "self species new" is good enough).

There shouldn't be any need to define any #...ect: method except in
Collection and ArrayedCollection.

Paolo


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Klaus D. Witzel
On Fri, 22 Feb 2008 06:22:32 +0100, Paolo Bonzini wrote:

> Klaus D. Witzel wrote:
>> Subject line says it all, check yourself,
>>  (Heap withAll: 'array') reject: [:x | x = $r]
>>  What's the rationale (there's no doc, no comment)? Archive shows that  
>> #species was changed to fix another (anonymous) bug but, this way the  
>> senders of #species can impossibly do what Smalltalk users expect from  
>> the collection hierarchy (and there is #asArray ...)
>
> You answered yourself as to the solution, which is to use #copyEmpty  
> more even in Collection (especially in #select: and #reject:).  #species  
> should be used only for #collect: (and even then, it only works by  
> chance, i.e. because sorted collections' #collect: methods return  
> something for which "self species new" is good enough).
>
> There shouldn't be any need to define any #...ect: method except in  
> Collection and ArrayedCollection.

Agreed and thanks for the feedback :)

> Paolo
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Why is Heap>>#species => Array?

Prof. Andrew P. Black

On 21 Feb 2008, at 23:49, Klaus D. Witzel wrote:

> On Fri, 22 Feb 2008 06:22:32 +0100, Paolo Bonzini wrote:
>
>> Klaus D. Witzel wrote:
>>> Subject line says it all, check yourself,
>>>  (Heap withAll: 'array') reject: [:x | x = $r]
>>>  What's the rationale (there's no doc, no comment)? Archive shows  
>>> that #species was changed to fix another (anonymous) bug but,  
>>> this way the senders of #species can impossibly do what Smalltalk  
>>> users expect from the collection hierarchy (and there is  
>>> #asArray ...)
>>

Klaus asked if we had any insights on this during the reengineering  
of the collection hierarchy.  The answer is yes, although my  
recollection of them may be imperfect.

We noticed that the use of #species was inconsistent between  
different classes.   We endeavored to fix these inconsistencies, to  
promote reuse of methods, and eliminate the multiplicity of slightly  
different versions.

We came to the conclusion that the #species method was intended for  
use in equality comparisons.  Two collections were equal if they had  
the same species and if they had the same elements.  Whether order  
matters depends on the species, so it matters if the species is  
Array, and not if it is Set.

However, some code used #species to answer a very different question:  
what class should I use to make a new collection like this one in a  
#collect: or a #select: ?    Sometimes this was OK, but sometimes the  
answer was different  from the answer that we got from #species.    
We decided to uniformly use two methods, emptyCopyOfSize: and  
emptyCopyOfSameSize , to generate the new collections.  
emptyCopyOfSameSize was implemented as

        ^ self  emptyCopyOfSize: self size

in TCollBasicImpl.

Using this, instead of

Set>>collect: aBlock
        "Evaluate aBlock with each of the receiver's elements as the argument.
        Collect the resulting values into a collection like the receiver.  
Answer
        the new collection."

        | newSet |
        newSet _ Set new: self size.
        array do: [:each | each ifNotNil: [newSet add: (aBlock value: each)]].
        ^ newSet

we get

TCollExtensibleUnsequence>>collect: aBlock
        "Evaluate aBlock with each of the receiver's elements as the argument.
        Collect the resulting values into a collection like the receiver.  
Answer
        the new collection."

        | newCollection |
        newCollection _ self emptyCopyOfSameSize.
        self withIndexDo: [:each :index |
                newCollection unsafeAdd: (aBlock value: each) possiblyAt: index].
        ^ newCollection makeSafe.

These methods also show another use of polymorphism: the method  
#unsafeAdd:possiblyAt:  The problem that this addresses is that some  
collections understand #at:put: (e.g, array) and others understand  
#add: (e.g., Set).  We made _all_ collections understand  
unsafeAdd:possiblyAt:  .  The second argument to this message is a  
_suggestion_ of an index at which to add the new element; the target  
can ignore this suggestion if it wants, as with a set.  The "unsafe"  
terminology has to do with internal invariants; a sorted collection  
can implement unsafeAdd:possiblyAt: to insert the new element at an  
arbitrary position, without sorting.  The rule is that eventually  
#makeSafe will be sent, and only _after_ that message is the user  
entitled to assume that the collection invariants will be once again  
true.  So sortedCollections can use it to sort; the default  
implementation does nothing and answers self.

I think that after we were done, we came to the conclusion that the  
kind of collection that results from a collect: ought to be a  
parameter to the collect.  For example, if I do a collect: over an  
IdentitySet, should the result be another IdentitySet, a Set, or an  
Array?  Well, it depends on the function that I'm applying: there is  
no one right answer.  Similarly, with collect: on a  
SortedCollection,  should the result be an OrderedCollection, an  
Array, or a new SortedCollection?  If the latter, with what sort  
block?    One way to do this is to have a #collect:into: method where  
the second argument is a collection into which the new elements will  
be added.  It's not even necessary for it to be empty!   Another  
possibility would be to provide as argument a block that does the  
adding ... this starts to look very much like #do:  So, we never  
implemented those variants.

        Andrew






Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Re: Why is Heap>>#species => Array?

Klaus D. Witzel
On Fri, 22 Feb 2008 23:25:54 +0100, Andrew P. Black wrote:

>
> On 21 Feb 2008, at 23:49, Klaus D. Witzel wrote:
>
>> On Fri, 22 Feb 2008 06:22:32 +0100, Paolo Bonzini wrote:
>>
>>> Klaus D. Witzel wrote:
>>>> Subject line says it all, check yourself,
>>>>  (Heap withAll: 'array') reject: [:x | x = $r]
>>>>  What's the rationale (there's no doc, no comment)? Archive shows  
>>>> that #species was changed to fix another (anonymous) bug but, this  
>>>> way the senders of #species can impossibly do what Smalltalk users  
>>>> expect from the collection hierarchy (and there is #asArray ...)
>>>
>
> Klaus asked if we had any insights on this during the reengineering of  
> the collection hierarchy.  The answer is yes, although my recollection  
> of them may be imperfect.
[...snipped a lot of insightful text...]

> I think that after we were done, we came to the conclusion that the kind  
> of collection that results from a collect: ought to be a parameter to  
> the collect.  For example, if I do a collect: over an IdentitySet,  
> should the result be another IdentitySet, a Set, or an Array?  Well, it  
> depends on the function that I'm applying: there is no one right  
> answer.  Similarly, with collect: on a SortedCollection,  should the  
> result be an OrderedCollection, an Array, or a new SortedCollection?  If  
> the latter, with what sort block?    One way to do this is to have a  
> #collect:into: method where the second argument is a collection into  
> which the new elements will be added.  It's not even necessary for it to  
> be empty!   Another possibility would be to provide as argument a block  
> that does the adding ... this starts to look very much like #do:  So, we  
> never implemented those variants.

Yeah, #collect:into: would give control over the resulting species to the  
developer; but I think it would be adopted rather reluctantly for existing  
apps (of course today's #collect: etc implementation could benefit from  
such factorization).

Thank you for the feedback !

/Klaus

> Andrew
>


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Paolo Bonzini-2
In reply to this post by Prof. Andrew P. Black

> However, some code used #species to answer a very different question:
> what class should I use to make a new collection like this one in a
> #collect: or a #select: ?    Sometimes this was OK, but sometimes the
> answer was different  from the answer that we got from #species.    We
> decided to uniformly use two methods, emptyCopyOfSize: and
> emptyCopyOfSameSize , to generate the new collections.  
> emptyCopyOfSameSize was implemented as
>
>     ^ self  emptyCopyOfSize: self size

This is what other Smalltalk usually call #copyEmpty and #copyEmpty:.

As I said, it works for everything except #reverse and #collect: for
which you can either use something like #copyEmptyForCollect, or resort
to #species.

Paolo

Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Why is Heap>>#species => Array?

Prof. Andrew P. Black
We spent a long time trying to find good names (and not just for these methods).  As I recall, we rejected #copyEmpty because we felt that the programmer might reasonably expect the result of #copyEmpty to be empty, and, with Array for example,  it isn't going to be.

Hence the more descriptive names.   They are mostly used internally, where the extra length is no handicap.  On  the occasions when they are used externally, the extra length is  a benefit.


On 23 Feb 2008, at 0:07, Paolo Bonzini wrote:

 We decided to uniformly use two methods, emptyCopyOfSize: and emptyCopyOfSameSize , to generate the new collections.  emptyCopyOfSameSize was implemented as

    ^ self  emptyCopyOfSize: self size


This is what other Smalltalk usually call #copyEmpty and #copyEmpty:.


As I said, it works for everything except #reverse and #collect: for which you can either use something like #copyEmptyForCollect, or resort to #species.


Why doesn't this work for #reversed? 

For collect, I agree, the new collection may not be of the same class, and thus should not be called a copy; #emptyCollectionForCollect:  or #emptyForCollect: would seem to be suggestive names.

Shouldn't #reverse (the imperative form of the verb) be a mutator?   But I guess that's another issue

Professor Andrew P. Black
Department of Computer Science
Portland State University
+1 503 725 2411





Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Paolo Bonzini-2

> We spent a long time trying to find good names (and not just for these
> methods).  As I recall, we rejected #copyEmpty because we felt that the
> programmer might reasonably expect the result of #copyEmpty to be empty,
> and, with Array for example,  it isn't going to be.

It's going to be empty in the sense that it is stuffed with nils --
that's as empty as an ArrayedCollection can get. :-)  However...

> Hence the more descriptive names.   They are mostly used internally,
> where the extra length is no handicap.  On  the occasions when they are
> used externally, the extra length is  a benefit.

... I agree with you (I was just pointing out the parallel with other
dialects in case people wanted to explore their implementation).

>> As I said, it works for everything except #reverse and #collect: for
>> which you can either use something like #copyEmptyForCollect, or
>> resort to #species.
>
> Why doesn't this work for #reversed?

Because SortedCollection's #reverse returns an OrderedCollection, like
#collect: does.

Paolo

Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Nicolas Cellier-3
Paolo Bonzini a écrit :

>
>>
>> Why doesn't this work for #reversed?
>
> Because SortedCollection's #reverse returns an OrderedCollection, like
> #collect: does.
>
> Paolo
>
>

Why not

SortedCollection>>reversed
        ^self asSortedCollection: [:a :b | sortBlock value: b value: a]

Of course, that could be optimized, because we now we do not have to
sort. Create a new instance with these:

        newFirstIndex := self size - lastIndex + 1.
        newLastIndex := self size - firstIndex + 1.
        newArray := array reversed.
        newSortBlock := [:a :b | sortBlock value: b value: a].

Nicolas


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Nicolas Cellier-3
nicolas cellier a écrit :
>
>     newFirstIndex := self size - lastIndex + 1.
>     newLastIndex := self size - firstIndex + 1.
>

Stupid me:

      newFirstIndex := array size - lastIndex + 1.
      newLastIndex := array size - firstIndex + 1.



Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: Why is Heap>>#species => Array?

Nicolas Cellier-3
In reply to this post by Nicolas Cellier-3
nicolas cellier a écrit :

> Why not
>
> SortedCollection>>reversed
>     ^self asSortedCollection: [:a :b | sortBlock value: b value: a]
>
> Of course, that could be optimized, because we now we do not have to
> sort. Create a new instance with these:
>

Now I know...
I know I'd better improve my English.

At http://bugs.squeak.org/view.php?id=6956 you will find a
SortedCollection reversed that is:
1) a SortedCollection
2) reversed

Maybe I was too conservative with sortBlock copy and fixTemps. Up to
more experienced to make it lighter.

Nicolas