The Trunk: Collections-nice.820.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-nice.820.mcz

commits-2
Nicolas Cellier uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-nice.820.mcz

==================== Summary ====================

Name: Collections-nice.820
Author: nice
Time: 12 February 2019, 11:56:35.262017 pm
UUID: bb383133-067c-4133-987b-c481a7de69c7
Ancestors: Collections-ul.819, Collections-cbc.813

Definitively abandon SequenceableCollection equality tests based on equal species.

Old behaviour can still be obtained thru hasEqualElements: but the default is to not try to support such trans-class equality feature because it is much too complex.

Particularly Interval are no more equal to Arrays with same sequence. We can thus optimize hash a bit more and fix the old bugs of equa objects with different hashes. Merge Collections-cbc.813 for this and rehashAll in postscript.

There are not so many classes concerned by this change, mainly RunArray, Interval and LinkedList:

Collection withAllSubclasses select: [:e | [e basicNew species ~= e] on: Error do: [false]]
-> an OrderedCollection(WeakRegistry LinkedList Interval ByteCharacterSet CharacterSetComplement LazyCharacterSet WideCharacterSet ShortRunArray Semaphore Mutex TextLineInterval WeakArray Monitor MCVersionName ByteSymbol WideSymbol)

We will have to change the tests that rely on such equality.

=============== Diff against Collections-ul.819 ===============

Item was changed:
  ----- Method: FloatArray>>= (in category 'comparing') -----
  = aFloatArray
- | length |
  <primitive: 'primitiveEqual' module: 'FloatArrayPlugin'>
+ ^super = aFloatArray!
- aFloatArray class = self class ifFalse: [^ false].
- length := self size.
- length = aFloatArray size ifFalse: [^ false].
- 1 to: self size do: [:i | (self at: i)
- = (aFloatArray at: i) ifFalse: [^ false]].
- ^ true!

Item was changed:
  ----- Method: Interval>>= (in category 'comparing') -----
  = anObject
-
  ^ self == anObject
+ or: [anObject isInterval
+ ifFalse: [super = anObject]
+ ifTrue:
+ [start = anObject first
+ and: [step = anObject increment
+ and: [self last = anObject last]]]]!
- ifTrue: [true]
- ifFalse: [anObject isInterval
- ifTrue: [start = anObject first
- and: [step = anObject increment
- and: [self last = anObject last]]]
- ifFalse: [super = anObject]]!

Item was changed:
  ----- Method: Interval>>hash (in category 'comparing') -----
  hash
  "Hash is reimplemented because = is implemented."
+         ^((start hash hashMultiply bitXor: self last hash) hashMultiply
+                 bitXor: self size)!
-
- ^(((start hash bitShift: 2)
- bitOr: stop hash)
- bitShift: 1)
- bitOr: self size!

Item was changed:
  ----- Method: RunArray>>= (in category 'comparing') -----
  = anObject
+ self == anObject ifTrue: [^ true].
- "Test if all my elements are equal to those of anObject"
-
  ^anObject class == self class
+ and:
- ifTrue: "Faster test between two RunArrays"
  [(runs hasEqualElements: anObject runs)
+ and: [values hasEqualElements: anObject values]]!
- and: [values hasEqualElements: anObject values]]
- ifFalse:
- [anObject isCollection and: [self hasEqualElements: anObject]]!

Item was changed:
  ----- Method: SequenceableCollection>>= (in category 'comparing') -----
  = otherCollection
  "Answer true if the receiver is equivalent to the otherCollection.
+ First test for identity, then rule out different class and sizes of
- First test for identity, then rule out different species and sizes of
  collections. As a last resort, examine each element of the receiver
  and the otherCollection."
 
  self == otherCollection ifTrue: [^ true].
+ self class == otherCollection class ifFalse: [^ false].
- self species == otherCollection species ifFalse: [^ false].
  ^ self hasEqualElements: otherCollection!

Item was changed:
  ----- Method: Symbol>>= (in category 'comparing') -----
  = aSymbol
  "Compare the receiver and aSymbol."
  self == aSymbol ifTrue: [^ true].
+ aSymbol isSymbol ifTrue: [^ false].
- self class == aSymbol class ifTrue: [^ false].
  "Use String comparison otherwise"
  ^ super = aSymbol!

Item was changed:
+ (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable.
+ HashedCollection rehashAll.'!
- (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable'!


cbc
Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-nice.820.mcz

cbc
Thank you. This was an angle I hadn't thought of - but definitely opens up how to fix it correctly:
>Definitively abandon SequenceableCollection equality tests based on equal species.  

On Tue, Feb 12, 2019 at 2:56 PM <[hidden email]> wrote:
Nicolas Cellier uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-nice.820.mcz

==================== Summary ====================

Name: Collections-nice.820
Author: nice
Time: 12 February 2019, 11:56:35.262017 pm
UUID: bb383133-067c-4133-987b-c481a7de69c7
Ancestors: Collections-ul.819, Collections-cbc.813

Definitively abandon SequenceableCollection equality tests based on equal species.

Old behaviour can still be obtained thru hasEqualElements: but the default is to not try to support such trans-class equality feature because it is much too complex.

Particularly Interval are no more equal to Arrays with same sequence. We can thus optimize hash a bit more and fix the old bugs of equa objects with different hashes. Merge Collections-cbc.813 for this and rehashAll in postscript.

There are not so many classes concerned by this change, mainly RunArray, Interval and LinkedList:

Collection withAllSubclasses select: [:e | [e basicNew species ~= e] on: Error do: [false]]
-> an OrderedCollection(WeakRegistry LinkedList Interval ByteCharacterSet CharacterSetComplement LazyCharacterSet WideCharacterSet ShortRunArray Semaphore Mutex TextLineInterval WeakArray Monitor MCVersionName ByteSymbol WideSymbol)

We will have to change the tests that rely on such equality.

=============== Diff against Collections-ul.819 ===============

Item was changed:
  ----- Method: FloatArray>>= (in category 'comparing') -----
  = aFloatArray
-       | length |
        <primitive: 'primitiveEqual' module: 'FloatArrayPlugin'>
+       ^super = aFloatArray!
-       aFloatArray class = self class ifFalse: [^ false].
-       length := self size.
-       length = aFloatArray size ifFalse: [^ false].
-       1 to: self size do: [:i | (self at: i)
-                       = (aFloatArray at: i) ifFalse: [^ false]].
-       ^ true!

Item was changed:
  ----- Method: Interval>>= (in category 'comparing') -----
  = anObject
-
        ^ self == anObject
+               or: [anObject isInterval
+                       ifFalse: [super = anObject]
+                       ifTrue:
+                               [start = anObject first
+                                and: [step = anObject increment
+                                and: [self last = anObject last]]]]!
-               ifTrue: [true]
-               ifFalse: [anObject isInterval
-                       ifTrue: [start = anObject first
-                               and: [step = anObject increment
-                                       and: [self last = anObject last]]]
-                       ifFalse: [super = anObject]]!

Item was changed:
  ----- Method: Interval>>hash (in category 'comparing') -----
  hash
        "Hash is reimplemented because = is implemented."
+         ^((start hash hashMultiply bitXor: self last hash) hashMultiply
+                 bitXor: self size)!
-
-       ^(((start hash bitShift: 2)
-               bitOr: stop hash)
-               bitShift: 1)
-               bitOr: self size!

Item was changed:
  ----- Method: RunArray>>= (in category 'comparing') -----
  = anObject
+       self == anObject ifTrue: [^ true].
-       "Test if all my elements are equal to those of anObject"
-
        ^anObject class == self class
+               and:
-               ifTrue: "Faster test between two RunArrays"
                        [(runs hasEqualElements: anObject runs)
+                        and: [values hasEqualElements: anObject values]]!
-                        and: [values hasEqualElements: anObject values]]
-               ifFalse:
-                       [anObject isCollection and: [self hasEqualElements: anObject]]!

Item was changed:
  ----- Method: SequenceableCollection>>= (in category 'comparing') -----
  = otherCollection
        "Answer true if the receiver is equivalent to the otherCollection.
+       First test for identity, then rule out different class and sizes of
-       First test for identity, then rule out different species and sizes of
        collections. As a last resort, examine each element of the receiver
        and the otherCollection."

        self == otherCollection ifTrue: [^ true].
+       self class == otherCollection class ifFalse: [^ false].
-       self species == otherCollection species ifFalse: [^ false].
        ^ self hasEqualElements: otherCollection!

Item was changed:
  ----- Method: Symbol>>= (in category 'comparing') -----
  = aSymbol
        "Compare the receiver and aSymbol."
        self == aSymbol ifTrue: [^ true].
+       aSymbol isSymbol ifTrue: [^ false].
-       self class == aSymbol class ifTrue: [^ false].
        "Use String comparison otherwise"
        ^ super = aSymbol!

Item was changed:
+ (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable.
+ HashedCollection rehashAll.'!
- (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable'!




Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-nice.820.mcz

Nicolas Cellier
Chris Muller wrote in another thread:

> There was another change to earlier today that you may be interested
> in asking that question about too, since it changed 19-year old
> SequenceableCollection>>#= with a one-day old replacement and actually
> went into trunk.

Please note that this has been discussed more than a month ago, and the discussion clearly converged to this consensus:
- having Interval and Array being = is not an important feature
- it seriously impacts efficiency of Interval hash
- it causes long standing bugs that we carry for years (19 years or so)

What's the point of having Interval = Array, when we have OrderedCollection ~= Array at the same time?
A less arbitrary feature would be a generalization of eachOperand isSequenceable ==> testThatElementsAreSameSequence:
But it's both too hard to maintain (a bug factory), aand a problem for efficient implementation of some specialized collection.
At the end, it's maybe not so much a desirable feature and it's practically not used in trunk images, apart for writing shorter tests.
I'm not a big fan of C motto when used with extremism: you don't pay for what you don't buy.
But I have the feeling that it makes sense here.

BTW, I don't like  hasEqualElements: because I mentally associate elements to Set, and for me #(1 2 3)  hasEqualElements: #(3 1 2), just with a different order.
I would prefer something like isSameSequenceAs: which clearly tells about the ordering.

Le mer. 13 févr. 2019 à 17:21, Chris Cunningham <[hidden email]> a écrit :
Thank you. This was an angle I hadn't thought of - but definitely opens up how to fix it correctly:
>Definitively abandon SequenceableCollection equality tests based on equal species.  

On Tue, Feb 12, 2019 at 2:56 PM <[hidden email]> wrote:
Nicolas Cellier uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-nice.820.mcz

==================== Summary ====================

Name: Collections-nice.820
Author: nice
Time: 12 February 2019, 11:56:35.262017 pm
UUID: bb383133-067c-4133-987b-c481a7de69c7
Ancestors: Collections-ul.819, Collections-cbc.813

Definitively abandon SequenceableCollection equality tests based on equal species.

Old behaviour can still be obtained thru hasEqualElements: but the default is to not try to support such trans-class equality feature because it is much too complex.

Particularly Interval are no more equal to Arrays with same sequence. We can thus optimize hash a bit more and fix the old bugs of equa objects with different hashes. Merge Collections-cbc.813 for this and rehashAll in postscript.

There are not so many classes concerned by this change, mainly RunArray, Interval and LinkedList:

Collection withAllSubclasses select: [:e | [e basicNew species ~= e] on: Error do: [false]]
-> an OrderedCollection(WeakRegistry LinkedList Interval ByteCharacterSet CharacterSetComplement LazyCharacterSet WideCharacterSet ShortRunArray Semaphore Mutex TextLineInterval WeakArray Monitor MCVersionName ByteSymbol WideSymbol)

We will have to change the tests that rely on such equality.

=============== Diff against Collections-ul.819 ===============

Item was changed:
  ----- Method: FloatArray>>= (in category 'comparing') -----
  = aFloatArray
-       | length |
        <primitive: 'primitiveEqual' module: 'FloatArrayPlugin'>
+       ^super = aFloatArray!
-       aFloatArray class = self class ifFalse: [^ false].
-       length := self size.
-       length = aFloatArray size ifFalse: [^ false].
-       1 to: self size do: [:i | (self at: i)
-                       = (aFloatArray at: i) ifFalse: [^ false]].
-       ^ true!

Item was changed:
  ----- Method: Interval>>= (in category 'comparing') -----
  = anObject
-
        ^ self == anObject
+               or: [anObject isInterval
+                       ifFalse: [super = anObject]
+                       ifTrue:
+                               [start = anObject first
+                                and: [step = anObject increment
+                                and: [self last = anObject last]]]]!
-               ifTrue: [true]
-               ifFalse: [anObject isInterval
-                       ifTrue: [start = anObject first
-                               and: [step = anObject increment
-                                       and: [self last = anObject last]]]
-                       ifFalse: [super = anObject]]!

Item was changed:
  ----- Method: Interval>>hash (in category 'comparing') -----
  hash
        "Hash is reimplemented because = is implemented."
+         ^((start hash hashMultiply bitXor: self last hash) hashMultiply
+                 bitXor: self size)!
-
-       ^(((start hash bitShift: 2)
-               bitOr: stop hash)
-               bitShift: 1)
-               bitOr: self size!

Item was changed:
  ----- Method: RunArray>>= (in category 'comparing') -----
  = anObject
+       self == anObject ifTrue: [^ true].
-       "Test if all my elements are equal to those of anObject"
-
        ^anObject class == self class
+               and:
-               ifTrue: "Faster test between two RunArrays"
                        [(runs hasEqualElements: anObject runs)
+                        and: [values hasEqualElements: anObject values]]!
-                        and: [values hasEqualElements: anObject values]]
-               ifFalse:
-                       [anObject isCollection and: [self hasEqualElements: anObject]]!

Item was changed:
  ----- Method: SequenceableCollection>>= (in category 'comparing') -----
  = otherCollection
        "Answer true if the receiver is equivalent to the otherCollection.
+       First test for identity, then rule out different class and sizes of
-       First test for identity, then rule out different species and sizes of
        collections. As a last resort, examine each element of the receiver
        and the otherCollection."

        self == otherCollection ifTrue: [^ true].
+       self class == otherCollection class ifFalse: [^ false].
-       self species == otherCollection species ifFalse: [^ false].
        ^ self hasEqualElements: otherCollection!

Item was changed:
  ----- Method: Symbol>>= (in category 'comparing') -----
  = aSymbol
        "Compare the receiver and aSymbol."
        self == aSymbol ifTrue: [^ true].
+       aSymbol isSymbol ifTrue: [^ false].
-       self class == aSymbol class ifTrue: [^ false].
        "Use String comparison otherwise"
        ^ super = aSymbol!

Item was changed:
+ (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable.
+ HashedCollection rehashAll.'!
- (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable'!





Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-nice.820.mcz

Chris Muller-3
> > There was another change to earlier today that you may be interested
> > in asking that question about too, since it changed 19-year old
> > SequenceableCollection>>#= with a one-day old replacement and actually
> > went into trunk.
>
> Please note that this has been discussed more than a month ago, and the discussion clearly converged to this consensus:
> - having Interval and Array being = is not an important feature
> - it seriously impacts efficiency of Interval hash
> - it causes long standing bugs that we carry for years (19 years or so)

Yes, I remember that discussion.

> What's the point of having Interval = Array, when we have OrderedCollection ~= Array at the same time?
> A less arbitrary feature would be a generalization of eachOperand isSequenceable ==> testThatElementsAreSameSequence:
> But it's both too hard to maintain (a bug factory), aand a problem for efficient implementation of some specialized collection.
> At the end, it's maybe not so much a desirable feature and it's practically not used in trunk images, apart for writing shorter tests.
> I'm not a big fan of C motto when used with extremism: you don't pay for what you don't buy.
> But I have the feeling that it makes sense here.

I have no reason to critique this reasoning, I simply did not have
time to test with it yet.  Sometimes testing after tweaks at low
levels can "reveal" things that weren't thought about before.  But
also, what I really dislike are continued use == on the class test.
I'm sorry, but these kinds of hacks are killing Squeak's dynamism by a
thousand cuts and me along with it.  :(   Would you **Please
Delegate** equivalence tests (#=) to the receiver, and let them
inherit the identity implementation from Object?

 - Chris


 - Chris

> BTW, I don't like  hasEqualElements: because I mentally associate elements to Set, and for me #(1 2 3)  hasEqualElements: #(3 1 2), just with a different order.
> I would prefer something like isSameSequenceAs: which clearly tells about the ordering.
>
> Le mer. 13 févr. 2019 à 17:21, Chris Cunningham <[hidden email]> a écrit :
>>
>> Thank you. This was an angle I hadn't thought of - but definitely opens up how to fix it correctly:
>> >Definitively abandon SequenceableCollection equality tests based on equal species.
>>
>> On Tue, Feb 12, 2019 at 2:56 PM <[hidden email]> wrote:
>>>
>>> Nicolas Cellier uploaded a new version of Collections to project The Trunk:
>>> http://source.squeak.org/trunk/Collections-nice.820.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Collections-nice.820
>>> Author: nice
>>> Time: 12 February 2019, 11:56:35.262017 pm
>>> UUID: bb383133-067c-4133-987b-c481a7de69c7
>>> Ancestors: Collections-ul.819, Collections-cbc.813
>>>
>>> Definitively abandon SequenceableCollection equality tests based on equal species.
>>>
>>> Old behaviour can still be obtained thru hasEqualElements: but the default is to not try to support such trans-class equality feature because it is much too complex.
>>>
>>> Particularly Interval are no more equal to Arrays with same sequence. We can thus optimize hash a bit more and fix the old bugs of equa objects with different hashes. Merge Collections-cbc.813 for this and rehashAll in postscript.
>>>
>>> There are not so many classes concerned by this change, mainly RunArray, Interval and LinkedList:
>>>
>>> Collection withAllSubclasses select: [:e | [e basicNew species ~= e] on: Error do: [false]]
>>> -> an OrderedCollection(WeakRegistry LinkedList Interval ByteCharacterSet CharacterSetComplement LazyCharacterSet WideCharacterSet ShortRunArray Semaphore Mutex TextLineInterval WeakArray Monitor MCVersionName ByteSymbol WideSymbol)
>>>
>>> We will have to change the tests that rely on such equality.
>>>
>>> =============== Diff against Collections-ul.819 ===============
>>>
>>> Item was changed:
>>>   ----- Method: FloatArray>>= (in category 'comparing') -----
>>>   = aFloatArray
>>> -       | length |
>>>         <primitive: 'primitiveEqual' module: 'FloatArrayPlugin'>
>>> +       ^super = aFloatArray!
>>> -       aFloatArray class = self class ifFalse: [^ false].
>>> -       length := self size.
>>> -       length = aFloatArray size ifFalse: [^ false].
>>> -       1 to: self size do: [:i | (self at: i)
>>> -                       = (aFloatArray at: i) ifFalse: [^ false]].
>>> -       ^ true!
>>>
>>> Item was changed:
>>>   ----- Method: Interval>>= (in category 'comparing') -----
>>>   = anObject
>>> -
>>>         ^ self == anObject
>>> +               or: [anObject isInterval
>>> +                       ifFalse: [super = anObject]
>>> +                       ifTrue:
>>> +                               [start = anObject first
>>> +                                and: [step = anObject increment
>>> +                                and: [self last = anObject last]]]]!
>>> -               ifTrue: [true]
>>> -               ifFalse: [anObject isInterval
>>> -                       ifTrue: [start = anObject first
>>> -                               and: [step = anObject increment
>>> -                                       and: [self last = anObject last]]]
>>> -                       ifFalse: [super = anObject]]!
>>>
>>> Item was changed:
>>>   ----- Method: Interval>>hash (in category 'comparing') -----
>>>   hash
>>>         "Hash is reimplemented because = is implemented."
>>> +         ^((start hash hashMultiply bitXor: self last hash) hashMultiply
>>> +                 bitXor: self size)!
>>> -
>>> -       ^(((start hash bitShift: 2)
>>> -               bitOr: stop hash)
>>> -               bitShift: 1)
>>> -               bitOr: self size!
>>>
>>> Item was changed:
>>>   ----- Method: RunArray>>= (in category 'comparing') -----
>>>   = anObject
>>> +       self == anObject ifTrue: [^ true].
>>> -       "Test if all my elements are equal to those of anObject"
>>> -
>>>         ^anObject class == self class
>>> +               and:
>>> -               ifTrue: "Faster test between two RunArrays"
>>>                         [(runs hasEqualElements: anObject runs)
>>> +                        and: [values hasEqualElements: anObject values]]!
>>> -                        and: [values hasEqualElements: anObject values]]
>>> -               ifFalse:
>>> -                       [anObject isCollection and: [self hasEqualElements: anObject]]!
>>>
>>> Item was changed:
>>>   ----- Method: SequenceableCollection>>= (in category 'comparing') -----
>>>   = otherCollection
>>>         "Answer true if the receiver is equivalent to the otherCollection.
>>> +       First test for identity, then rule out different class and sizes of
>>> -       First test for identity, then rule out different species and sizes of
>>>         collections. As a last resort, examine each element of the receiver
>>>         and the otherCollection."
>>>
>>>         self == otherCollection ifTrue: [^ true].
>>> +       self class == otherCollection class ifFalse: [^ false].
>>> -       self species == otherCollection species ifFalse: [^ false].
>>>         ^ self hasEqualElements: otherCollection!
>>>
>>> Item was changed:
>>>   ----- Method: Symbol>>= (in category 'comparing') -----
>>>   = aSymbol
>>>         "Compare the receiver and aSymbol."
>>>         self == aSymbol ifTrue: [^ true].
>>> +       aSymbol isSymbol ifTrue: [^ false].
>>> -       self class == aSymbol class ifTrue: [^ false].
>>>         "Use String comparison otherwise"
>>>         ^ super = aSymbol!
>>>
>>> Item was changed:
>>> + (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable.
>>> + HashedCollection rehashAll.'!
>>> - (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable'!
>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-nice.820.mcz

Levente Uzonyi
On Fri, 15 Feb 2019, Chris Muller wrote:

>>> There was another change to earlier today that you may be interested
>> > in asking that question about too, since it changed 19-year old
>> > SequenceableCollection>>#= with a one-day old replacement and actually
>> > went into trunk.
>>
>> Please note that this has been discussed more than a month ago, and the discussion clearly converged to this consensus:
>> - having Interval and Array being = is not an important feature
>> - it seriously impacts efficiency of Interval hash
>> - it causes long standing bugs that we carry for years (19 years or so)
>
> Yes, I remember that discussion.
>
>> What's the point of having Interval = Array, when we have OrderedCollection ~= Array at the same time?
>> A less arbitrary feature would be a generalization of eachOperand isSequenceable ==> testThatElementsAreSameSequence:
>> But it's both too hard to maintain (a bug factory), aand a problem for efficient implementation of some specialized collection.
>> At the end, it's maybe not so much a desirable feature and it's practically not used in trunk images, apart for writing shorter tests.
>> I'm not a big fan of C motto when used with extremism: you don't pay for what you don't buy.
>> But I have the feeling that it makes sense here.
>
> I have no reason to critique this reasoning, I simply did not have
> time to test with it yet.  Sometimes testing after tweaks at low
> levels can "reveal" things that weren't thought about before.  But
> also, what I really dislike are continued use == on the class test.
What is the use case which doesn't work with the

  foo class == bar class ifTrue: [ ... ]

pattern?

Levente

> I'm sorry, but these kinds of hacks are killing Squeak's dynamism by a
> thousand cuts and me along with it.  :(   Would you **Please
> Delegate** equivalence tests (#=) to the receiver, and let them
> inherit the identity implementation from Object?
>
> - Chris
>
>
> - Chris
>
>> BTW, I don't like  hasEqualElements: because I mentally associate elements to Set, and for me #(1 2 3)  hasEqualElements: #(3 1 2), just with a different order.
>> I would prefer something like isSameSequenceAs: which clearly tells about the ordering.
>>
>> Le mer. 13 févr. 2019 à 17:21, Chris Cunningham <[hidden email]> a écrit :
>>>
>>> Thank you. This was an angle I hadn't thought of - but definitely opens up how to fix it correctly:
>>> >Definitively abandon SequenceableCollection equality tests based on equal species.
>>>
>>> On Tue, Feb 12, 2019 at 2:56 PM <[hidden email]> wrote:
>>>>
>>>> Nicolas Cellier uploaded a new version of Collections to project The Trunk:
>>>> http://source.squeak.org/trunk/Collections-nice.820.mcz
>>>>
>>>> ==================== Summary ====================
>>>>
>>>> Name: Collections-nice.820
>>>> Author: nice
>>>> Time: 12 February 2019, 11:56:35.262017 pm
>>>> UUID: bb383133-067c-4133-987b-c481a7de69c7
>>>> Ancestors: Collections-ul.819, Collections-cbc.813
>>>>
>>>> Definitively abandon SequenceableCollection equality tests based on equal species.
>>>>
>>>> Old behaviour can still be obtained thru hasEqualElements: but the default is to not try to support such trans-class equality feature because it is much too complex.
>>>>
>>>> Particularly Interval are no more equal to Arrays with same sequence. We can thus optimize hash a bit more and fix the old bugs of equa objects with different hashes. Merge Collections-cbc.813 for this and rehashAll in postscript.
>>>>
>>>> There are not so many classes concerned by this change, mainly RunArray, Interval and LinkedList:
>>>>
>>>> Collection withAllSubclasses select: [:e | [e basicNew species ~= e] on: Error do: [false]]
>>>> -> an OrderedCollection(WeakRegistry LinkedList Interval ByteCharacterSet CharacterSetComplement LazyCharacterSet WideCharacterSet ShortRunArray Semaphore Mutex TextLineInterval WeakArray Monitor MCVersionName ByteSymbol WideSymbol)
>>>>
>>>> We will have to change the tests that rely on such equality.
>>>>
>>>> =============== Diff against Collections-ul.819 ===============
>>>>
>>>> Item was changed:
>>>>   ----- Method: FloatArray>>= (in category 'comparing') -----
>>>>   = aFloatArray
>>>> -       | length |
>>>>         <primitive: 'primitiveEqual' module: 'FloatArrayPlugin'>
>>>> +       ^super = aFloatArray!
>>>> -       aFloatArray class = self class ifFalse: [^ false].
>>>> -       length := self size.
>>>> -       length = aFloatArray size ifFalse: [^ false].
>>>> -       1 to: self size do: [:i | (self at: i)
>>>> -                       = (aFloatArray at: i) ifFalse: [^ false]].
>>>> -       ^ true!
>>>>
>>>> Item was changed:
>>>>   ----- Method: Interval>>= (in category 'comparing') -----
>>>>   = anObject
>>>> -
>>>>         ^ self == anObject
>>>> +               or: [anObject isInterval
>>>> +                       ifFalse: [super = anObject]
>>>> +                       ifTrue:
>>>> +                               [start = anObject first
>>>> +                                and: [step = anObject increment
>>>> +                                and: [self last = anObject last]]]]!
>>>> -               ifTrue: [true]
>>>> -               ifFalse: [anObject isInterval
>>>> -                       ifTrue: [start = anObject first
>>>> -                               and: [step = anObject increment
>>>> -                                       and: [self last = anObject last]]]
>>>> -                       ifFalse: [super = anObject]]!
>>>>
>>>> Item was changed:
>>>>   ----- Method: Interval>>hash (in category 'comparing') -----
>>>>   hash
>>>>         "Hash is reimplemented because = is implemented."
>>>> +         ^((start hash hashMultiply bitXor: self last hash) hashMultiply
>>>> +                 bitXor: self size)!
>>>> -
>>>> -       ^(((start hash bitShift: 2)
>>>> -               bitOr: stop hash)
>>>> -               bitShift: 1)
>>>> -               bitOr: self size!
>>>>
>>>> Item was changed:
>>>>   ----- Method: RunArray>>= (in category 'comparing') -----
>>>>   = anObject
>>>> +       self == anObject ifTrue: [^ true].
>>>> -       "Test if all my elements are equal to those of anObject"
>>>> -
>>>>         ^anObject class == self class
>>>> +               and:
>>>> -               ifTrue: "Faster test between two RunArrays"
>>>>                         [(runs hasEqualElements: anObject runs)
>>>> +                        and: [values hasEqualElements: anObject values]]!
>>>> -                        and: [values hasEqualElements: anObject values]]
>>>> -               ifFalse:
>>>> -                       [anObject isCollection and: [self hasEqualElements: anObject]]!
>>>>
>>>> Item was changed:
>>>>   ----- Method: SequenceableCollection>>= (in category 'comparing') -----
>>>>   = otherCollection
>>>>         "Answer true if the receiver is equivalent to the otherCollection.
>>>> +       First test for identity, then rule out different class and sizes of
>>>> -       First test for identity, then rule out different species and sizes of
>>>>         collections. As a last resort, examine each element of the receiver
>>>>         and the otherCollection."
>>>>
>>>>         self == otherCollection ifTrue: [^ true].
>>>> +       self class == otherCollection class ifFalse: [^ false].
>>>> -       self species == otherCollection species ifFalse: [^ false].
>>>>         ^ self hasEqualElements: otherCollection!
>>>>
>>>> Item was changed:
>>>>   ----- Method: Symbol>>= (in category 'comparing') -----
>>>>   = aSymbol
>>>>         "Compare the receiver and aSymbol."
>>>>         self == aSymbol ifTrue: [^ true].
>>>> +       aSymbol isSymbol ifTrue: [^ false].
>>>> -       self class == aSymbol class ifTrue: [^ false].
>>>>         "Use String comparison otherwise"
>>>>         ^ super = aSymbol!
>>>>
>>>> Item was changed:
>>>> + (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable.
>>>> + HashedCollection rehashAll.'!
>>>> - (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable'!
>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-nice.820.mcz

Chris Muller-4
> >>> There was another change to earlier today that you may be interested
> >> > in asking that question about too, since it changed 19-year old
> >> > SequenceableCollection>>#= with a one-day old replacement and actually
> >> > went into trunk.
> >>
> >> Please note that this has been discussed more than a month ago, and the discussion clearly converged to this consensus:
> >> - having Interval and Array being = is not an important feature
> >> - it seriously impacts efficiency of Interval hash
> >> - it causes long standing bugs that we carry for years (19 years or so)
> >
> > Yes, I remember that discussion.
> >
> >> What's the point of having Interval = Array, when we have OrderedCollection ~= Array at the same time?
> >> A less arbitrary feature would be a generalization of eachOperand isSequenceable ==> testThatElementsAreSameSequence:
> >> But it's both too hard to maintain (a bug factory), aand a problem for efficient implementation of some specialized collection.
> >> At the end, it's maybe not so much a desirable feature and it's practically not used in trunk images, apart for writing shorter tests.
> >> I'm not a big fan of C motto when used with extremism: you don't pay for what you don't buy.
> >> But I have the feeling that it makes sense here.
> >
> > I have no reason to critique this reasoning, I simply did not have
> > time to test with it yet.  Sometimes testing after tweaks at low
> > levels can "reveal" things that weren't thought about before.  But
> > also, what I really dislike are continued use == on the class test.
>
> What is the use case which doesn't work with the
>
>         foo class == bar class ifTrue: [ ... ]
>
> pattern?

Magma uses Avi Bryant's WriteBarrier which depends on that.  In the
example above, foo's and/or bar's class were replaced by an anonymous
subclass which WOULD compare #= to its superclass, if only the caller
wouldn't subvert its chance by using an identity test.

The result is at least the 2nd-worst type of application bug -- the
kind that execute incorrectly, but silently (hopefully not resulting
in corrupt data!  but possible..)

Guys, I used to sprinkle #== sends for "performance" too, a habit I
learned early-on, but development and use of Magma case gave me the
context to understand why its wrong from an OO perspective, and why
I'm pleading with you to readjust your development pattern on this.
Please only send #== when an identity check is actually needed (which
is almost never).  Doing it for "performance" ends up being very
destructive to Magma applications.

Best,
  Chris


  - Chris

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-nice.820.mcz

Levente Uzonyi
On Fri, 15 Feb 2019, Chris Muller wrote:

>>>>> There was another change to earlier today that you may be interested
>>>>> in asking that question about too, since it changed 19-year old
>>>>> SequenceableCollection>>#= with a one-day old replacement and actually
>>>>> went into trunk.
>>>>
>>>> Please note that this has been discussed more than a month ago, and the discussion clearly converged to this consensus:
>>>> - having Interval and Array being = is not an important feature
>>>> - it seriously impacts efficiency of Interval hash
>>>> - it causes long standing bugs that we carry for years (19 years or so)
>>>
>>> Yes, I remember that discussion.
>>>
>>>> What's the point of having Interval = Array, when we have OrderedCollection ~= Array at the same time?
>>>> A less arbitrary feature would be a generalization of eachOperand isSequenceable ==> testThatElementsAreSameSequence:
>>>> But it's both too hard to maintain (a bug factory), aand a problem for efficient implementation of some specialized collection.
>>>> At the end, it's maybe not so much a desirable feature and it's practically not used in trunk images, apart for writing shorter tests.
>>>> I'm not a big fan of C motto when used with extremism: you don't pay for what you don't buy.
>>>> But I have the feeling that it makes sense here.
>>>
>>> I have no reason to critique this reasoning, I simply did not have
>>> time to test with it yet.  Sometimes testing after tweaks at low
>>> levels can "reveal" things that weren't thought about before.  But
>>> also, what I really dislike are continued use == on the class test.
>>
>> What is the use case which doesn't work with the
>>
>>         foo class == bar class ifTrue: [ ... ]
>>
>> pattern?
>
> Magma uses Avi Bryant's WriteBarrier which depends on that.  In the
> example above, foo's and/or bar's class were replaced by an anonymous
> subclass which WOULD compare #= to its superclass, if only the caller
> wouldn't subvert its chance by using an identity test.
>
> The result is at least the 2nd-worst type of application bug -- the
> kind that execute incorrectly, but silently (hopefully not resulting
> in corrupt data!  but possible..)
>
> Guys, I used to sprinkle #== sends for "performance" too, a habit I
> learned early-on, but development and use of Magma case gave me the
> context to understand why its wrong from an OO perspective, and why
> I'm pleading with you to readjust your development pattern on this.
> Please only send #== when an identity check is actually needed (which
> is almost never).  Doing it for "performance" ends up being very
> destructive to Magma applications.

So WriteBarrier breaks two features of the system:
1) #class will return the receiver's class
2) classes are unique in the system

It seems to me that WriteBarrier is at fault, and it should do its best to
fix it:
- disable #== and #class bytecodes when loaded (make the encoder not
generate them and recompile existing senders)
- implement #== on Behavior to properly handle mangling with classes

Levente

>
> Best,
>  Chris
>
>
>  - Chris
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-nice.820.mcz

David T. Lewis
On Sat, Feb 16, 2019 at 03:19:37AM +0100, Levente Uzonyi wrote:

> On Fri, 15 Feb 2019, Chris Muller wrote:
>
> >>>>>There was another change to earlier today that you may be interested
> >>>>>in asking that question about too, since it changed 19-year old
> >>>>>SequenceableCollection>>#= with a one-day old replacement and actually
> >>>>>went into trunk.
> >>>>
> >>>>Please note that this has been discussed more than a month ago, and the
> >>>>discussion clearly converged to this consensus:
> >>>>- having Interval and Array being = is not an important feature
> >>>>- it seriously impacts efficiency of Interval hash
> >>>>- it causes long standing bugs that we carry for years (19 years or so)
> >>>
> >>>Yes, I remember that discussion.
> >>>
> >>>>What's the point of having Interval = Array, when we have
> >>>>OrderedCollection ~= Array at the same time?
> >>>>A less arbitrary feature would be a generalization of eachOperand
> >>>>isSequenceable ==> testThatElementsAreSameSequence:
> >>>>But it's both too hard to maintain (a bug factory), aand a problem for
> >>>>efficient implementation of some specialized collection.
> >>>>At the end, it's maybe not so much a desirable feature and it's
> >>>>practically not used in trunk images, apart for writing shorter tests.
> >>>>I'm not a big fan of C motto when used with extremism: you don't pay
> >>>>for what you don't buy.
> >>>>But I have the feeling that it makes sense here.
> >>>
> >>>I have no reason to critique this reasoning, I simply did not have
> >>>time to test with it yet.  Sometimes testing after tweaks at low
> >>>levels can "reveal" things that weren't thought about before.  But
> >>>also, what I really dislike are continued use == on the class test.
> >>
> >>What is the use case which doesn't work with the
> >>
> >>        foo class == bar class ifTrue: [ ... ]
> >>
> >>pattern?
> >
> >Magma uses Avi Bryant's WriteBarrier which depends on that.  In the
> >example above, foo's and/or bar's class were replaced by an anonymous
> >subclass which WOULD compare #= to its superclass, if only the caller
> >wouldn't subvert its chance by using an identity test.
> >
> >The result is at least the 2nd-worst type of application bug -- the
> >kind that execute incorrectly, but silently (hopefully not resulting
> >in corrupt data!  but possible..)
> >
> >Guys, I used to sprinkle #== sends for "performance" too, a habit I
> >learned early-on, but development and use of Magma case gave me the
> >context to understand why its wrong from an OO perspective, and why
> >I'm pleading with you to readjust your development pattern on this.
> >Please only send #== when an identity check is actually needed (which
> >is almost never).  Doing it for "performance" ends up being very
> >destructive to Magma applications.
>
> So WriteBarrier breaks two features of the system:
> 1) #class will return the receiver's class
> 2) classes are unique in the system
>
> It seems to me that WriteBarrier is at fault, and it should do its best to
> fix it:
> - disable #== and #class bytecodes when loaded (make the encoder not
> generate them and recompile existing senders)
> - implement #== on Behavior to properly handle mangling with classes
>

I put Collections-dtl.821 in the inbox. The change is trivial, and it should
address the issue that Chris identified with no meaningful performance impact.

   Name: Collections-dtl.821
   Time: 16 February 2019, 6:18:27.925895 pm

   Classes are expected to be unique in the system, but in some cases
   (e.g. Magma) it is also useful to expect a proxy for a class to test
   equivalent to the actual class. Therefore, in SequenceableCollection>>=
   use #= rather than #== for the class comparison.


Dave