#= ==> #hash issues

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

Chris Muller-3
> Isn't this extremely simple to fix?
>
> #= is implemented in terms of start, step, and last.
>
> So why not implement #hash as
>
> ^(start hash bitXor: step hash) bitXor: self last hash
>
> Then in the postscript do a
>
> HashedCollection rehashAll
>
> and that should be it, right?
>
> I did a quick check using
>
> Dictionary allInstances select: [ :dict | dict keys anySatisfy: #isInterval]
>
> and the system seems fine after that.

You forgot about the universe that lies beyond the fringes of the
image.  There be persistent data files out there.  And users.
Possibly with systems that rely on Interval>>#hash.

But since most applications don't use non-Integer based Intervals
(since Interval doesn't really support them, by design, I guess),
there's no reason whatsoever to decimate the universe when Eliot's
simply fixes the corner case that no one is using anyway.

+1 to Eliots suggestion to fix Interval>>#hash.

 - Chris

Reply | Threaded
Open this post in threaded view
|

#= ==> #hash issues

Louis LaBrunda
Hi Guys,

I don't work for Instantiations, so this decision isn't mine to make.  That said, I have to agree with their desire to
be cautious.  There is no up side to them to change this and even though the down side should be small, there is no real
way of knowing how big or small it is.

Lou

On Thu, 15 Nov 2018 15:32:09 -0600, Chris Muller <[hidden email]> wrote:

>> Isn't this extremely simple to fix?
>>
>> #= is implemented in terms of start, step, and last.
>>
>> So why not implement #hash as
>>
>> ^(start hash bitXor: step hash) bitXor: self last hash
>>
>> Then in the postscript do a
>>
>> HashedCollection rehashAll
>>
>> and that should be it, right?
>>
>> I did a quick check using
>>
>> Dictionary allInstances select: [ :dict | dict keys anySatisfy: #isInterval]
>>
>> and the system seems fine after that.
>
>You forgot about the universe that lies beyond the fringes of the
>image.  There be persistent data files out there.  And users.
>Possibly with systems that rely on Interval>>#hash.
>
>But since most applications don't use non-Integer based Intervals
>(since Interval doesn't really support them, by design, I guess),
>there's no reason whatsoever to decimate the universe when Eliot's
>simply fixes the corner case that no one is using anyway.
>
>+1 to Eliots suggestion to fix Interval>>#hash.
>
> - Chris
>
--
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon


cbc
Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

cbc
In reply to this post by Chris Muller-3
Nice.  It seems like we have consensus on what to change.

I'll push these changes (with the tests) to trunk soon.

The fix I have for #hash was exactly what Elliot suggested.
I'll make sure to include the rehash as well (thanks for the code snippit Bert!)
If no one objects strenuously, I'll also include Eliot's slight rewrite of #= has well - it is marginally cleaner and equally fast, so now is a reasonable time to include it.

I'll delay working on bug #3380 for now - to fix this, we'd have to also add in a check on class in #= to make sure we aren't comparing an interval to an array.  Unless someone has been bitten by this recently, I'd rather wait.

-cbc


On Thu, Nov 15, 2018 at 1:32 PM Chris Muller <[hidden email]> wrote:
> Isn't this extremely simple to fix?
>
> #= is implemented in terms of start, step, and last.
>
> So why not implement #hash as
>
> ^(start hash bitXor: step hash) bitXor: self last hash
>
> Then in the postscript do a
>
> HashedCollection rehashAll
>
> and that should be it, right?
>
> I did a quick check using
>
> Dictionary allInstances select: [ :dict | dict keys anySatisfy: #isInterval]
>
> and the system seems fine after that.

You forgot about the universe that lies beyond the fringes of the
image.  There be persistent data files out there.  And users.
Possibly with systems that rely on Interval>>#hash.

But since most applications don't use non-Integer based Intervals
(since Interval doesn't really support them, by design, I guess),
there's no reason whatsoever to decimate the universe when Eliot's
simply fixes the corner case that no one is using anyway.

+1 to Eliots suggestion to fix Interval>>#hash.

 - Chris



Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

Chris Muller-3
> Nice.  It seems like we have consensus on what to change.
>
> I'll push these changes (with the tests) to trunk soon.

Hey, "seems" carries enough uncertainty to give us one final look
before trunk.  By "these changes" are you referring to just the
Interval>>#hash or some Array changes, too?  All we've seen so far are
Collections-cbc.810.mcz, could we get one look at your final draft
proposal before trunk?

On a less important note, I personally find a pure conditional
nomenclature more attractive than the embedded ifTrue:ifFalse:, like:

= anObject
    ^ self == anObject or:
        [ (anObject isInterval
            and:
                [ start = anObject first and:
                    [ step = anObject increment and: [ self last =
anObject last ] ] ])
            or: [ super = anObject ] ]

For whatever and whenever you push, I'm sure you already were but,
just in case, I would be grateful if you would please base it solely
off the current top trunk version with no intermediate versions in the
ancestry.   :-)

Thanks a lot finding this and helping get it fixed!

Best Regards,
  Chris

 - Chris


> The fix I have for #hash was exactly what Elliot suggested.
> I'll make sure to include the rehash as well (thanks for the code snippit Bert!)
> If no one objects strenuously, I'll also include Eliot's slight rewrite of #= has well - it is marginally cleaner and equally fast, so now is a reasonable time to include it.
>
> I'll delay working on bug #3380 for now - to fix this, we'd have to also add in a check on class in #= to make sure we aren't comparing an interval to an array.  Unless someone has been bitten by this recently, I'd rather wait.
>
> -cbc
>
>
> On Thu, Nov 15, 2018 at 1:32 PM Chris Muller <[hidden email]> wrote:
>>
>> > Isn't this extremely simple to fix?
>> >
>> > #= is implemented in terms of start, step, and last.
>> >
>> > So why not implement #hash as
>> >
>> > ^(start hash bitXor: step hash) bitXor: self last hash
>> >
>> > Then in the postscript do a
>> >
>> > HashedCollection rehashAll
>> >
>> > and that should be it, right?
>> >
>> > I did a quick check using
>> >
>> > Dictionary allInstances select: [ :dict | dict keys anySatisfy: #isInterval]
>> >
>> > and the system seems fine after that.
>>
>> You forgot about the universe that lies beyond the fringes of the
>> image.  There be persistent data files out there.  And users.
>> Possibly with systems that rely on Interval>>#hash.
>>
>> But since most applications don't use non-Integer based Intervals
>> (since Interval doesn't really support them, by design, I guess),
>> there's no reason whatsoever to decimate the universe when Eliot's
>> simply fixes the corner case that no one is using anyway.
>>
>> +1 to Eliots suggestion to fix Interval>>#hash.
>>
>>  - Chris
>>
>

cbc
Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

cbc
HI Chris,

The changes will be limited to Interval, and will be changes to #= and hash (and the interval test so this doesn't show up again).

I'll push the changes to inbox soon; and to trunk tomorrow/early next week.  The test will go to Trunk with the changes to inbox (the test will be what I've pushed to the inbox minus the 3380 part).

And, yes, I'll rebase if off of the current trunk version - there has been significant changes since my last proposal.

Interestingly:

= anObject
    ^ self == anObject or:
        [ (anObject isInterval
            and:
                [ start = anObject first and:
                    [ step = anObject increment and: [ self last =
anObject last ] ] ])
            or: [ super = anObject ] ]  

This is actually wrong - if the two items to compare are intervals but they don't match based on interval hash (first/last/increment), then it will check if super #= returns true - that is not desirable.  But, I understand the desire you mention here - which I believe is what Eliot was driving for as well.

-cbc

On Thu, Nov 15, 2018 at 2:21 PM Chris Muller <[hidden email]> wrote:
> Nice.  It seems like we have consensus on what to change.
>
> I'll push these changes (with the tests) to trunk soon.

Hey, "seems" carries enough uncertainty to give us one final look
before trunk.  By "these changes" are you referring to just the
Interval>>#hash or some Array changes, too?  All we've seen so far are
Collections-cbc.810.mcz, could we get one look at your final draft
proposal before trunk?

On a less important note, I personally find a pure conditional
nomenclature more attractive than the embedded ifTrue:ifFalse:, like:

= anObject
    ^ self == anObject or:
        [ (anObject isInterval
            and:
                [ start = anObject first and:
                    [ step = anObject increment and: [ self last =
anObject last ] ] ])
            or: [ super = anObject ] ]

For whatever and whenever you push, I'm sure you already were but,
just in case, I would be grateful if you would please base it solely
off the current top trunk version with no intermediate versions in the
ancestry.   :-)

Thanks a lot finding this and helping get it fixed!

Best Regards,
  Chris

 - Chris


> The fix I have for #hash was exactly what Elliot suggested.
> I'll make sure to include the rehash as well (thanks for the code snippit Bert!)
> If no one objects strenuously, I'll also include Eliot's slight rewrite of #= has well - it is marginally cleaner and equally fast, so now is a reasonable time to include it.
>
> I'll delay working on bug #3380 for now - to fix this, we'd have to also add in a check on class in #= to make sure we aren't comparing an interval to an array.  Unless someone has been bitten by this recently, I'd rather wait.
>
> -cbc
>
>
> On Thu, Nov 15, 2018 at 1:32 PM Chris Muller <[hidden email]> wrote:
>>
>> > Isn't this extremely simple to fix?
>> >
>> > #= is implemented in terms of start, step, and last.
>> >
>> > So why not implement #hash as
>> >
>> > ^(start hash bitXor: step hash) bitXor: self last hash
>> >
>> > Then in the postscript do a
>> >
>> > HashedCollection rehashAll
>> >
>> > and that should be it, right?
>> >
>> > I did a quick check using
>> >
>> > Dictionary allInstances select: [ :dict | dict keys anySatisfy: #isInterval]
>> >
>> > and the system seems fine after that.
>>
>> You forgot about the universe that lies beyond the fringes of the
>> image.  There be persistent data files out there.  And users.
>> Possibly with systems that rely on Interval>>#hash.
>>
>> But since most applications don't use non-Integer based Intervals
>> (since Interval doesn't really support them, by design, I guess),
>> there's no reason whatsoever to decimate the universe when Eliot's
>> simply fixes the corner case that no one is using anyway.
>>
>> +1 to Eliots suggestion to fix Interval>>#hash.
>>
>>  - Chris
>>
>



Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

Bert Freudenberg
Somehow I missed Eliot's version, but unsurprisingly he had exactly the same idea (use "last" not "stop" for hash). I'd still think bitXor: is preferable to bitOr, that is the standard way in almost all hash methods. But ...

BUT: I forgot about the super fallback in #=. That makes this discussion pretty much moot, because since

#(1 2 3) = (1 to: 3) "true"

is true, this must also be true:

#(1 2 3) hash = (1 to: 3) hash  "must be true"

So the only proper fix IMHO is to remove #hash from Interval (or replace it with ^super hash and a proper comment)

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

Eliot Miranda-2
Hi Bert,

On Thu, Nov 15, 2018 at 4:38 PM Bert Freudenberg <[hidden email]> wrote:
Somehow I missed Eliot's version, but unsurprisingly he had exactly the same idea (use "last" not "stop" for hash). I'd still think bitXor: is preferable to bitOr, that is the standard way in almost all hash methods. But ...

BUT: I forgot about the super fallback in #=. That makes this discussion pretty much moot, because since

#(1 2 3) = (1 to: 3) "true"

is true, this must also be true:

#(1 2 3) hash = (1 to: 3) hash  "must be true"

So the only proper fix IMHO is to remove #hash from Interval (or replace it with ^super hash and a proper comment)


We discussed this a couple of weeks ago.  There is no need for 
    #(1 2 3) = (1 to: 3) 
to be true. 
    #(1 2 3) = #[1 2 3]
isn’t true.  And we have hasEqualElements:.  So a more coherent approach is for the hack that makes intervals equal to arrays be discarded, and the hashes kept distinct.
 

- Bert -




--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

Bert Freudenberg
On Thu, Nov 15, 2018 at 5:36 PM Eliot Miranda <[hidden email]> wrote:
Hi Bert,

On Thu, Nov 15, 2018 at 4:38 PM Bert Freudenberg <[hidden email]> wrote:
Somehow I missed Eliot's version, but unsurprisingly he had exactly the same idea (use "last" not "stop" for hash). I'd still think bitXor: is preferable to bitOr, that is the standard way in almost all hash methods. But ...

BUT: I forgot about the super fallback in #=. That makes this discussion pretty much moot, because since

#(1 2 3) = (1 to: 3) "true"

is true, this must also be true:

#(1 2 3) hash = (1 to: 3) hash  "must be true"

So the only proper fix IMHO is to remove #hash from Interval (or replace it with ^super hash and a proper comment)


We discussed this a couple of weeks ago.  There is no need for 
    #(1 2 3) = (1 to: 3) 
to be true. 
    #(1 2 3) = #[1 2 3]
isn’t true.  And we have hasEqualElements:.  So a more coherent approach is for the hack that makes intervals equal to arrays be discarded, and the hashes kept distinct.

Makes sense. The version you posted ("I would have written...") still delegated to super>>= so I thought we wanted to keep that. But I agree that it's of little utility.

- Bert -


 


cbc
Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

cbc
In reply to this post by Eliot Miranda-2


On Thu, Nov 15, 2018 at 5:36 PM Eliot Miranda <[hidden email]> wrote:
Hi Bert,

On Thu, Nov 15, 2018 at 4:38 PM Bert Freudenberg <[hidden email]> wrote:
Somehow I missed Eliot's version, but unsurprisingly he had exactly the same idea (use "last" not "stop" for hash). I'd still think bitXor: is preferable to bitOr, that is the standard way in almost all hash methods. But ...

BUT: I forgot about the super fallback in #=. That makes this discussion pretty much moot, because since

#(1 2 3) = (1 to: 3) "true"

is true, this must also be true:

#(1 2 3) hash = (1 to: 3) hash  "must be true"

So the only proper fix IMHO is to remove #hash from Interval (or replace it with ^super hash and a proper comment)


We discussed this a couple of weeks ago.  There is no need for 
    #(1 2 3) = (1 to: 3) 
to be true. 
    #(1 2 3) = #[1 2 3]
isn’t true.  And we have hasEqualElements:.  So a more coherent approach is for the hack that makes intervals equal to arrays be discarded, and the hashes kept distinct.
 
And I agreed with you weeks ago, but looking at it closer, the code specifically says Interval is a species of Array.
Interestingly, ByteArray, which is a subclass of ArrayedCollection, doesn't set its species, so its species is ByteArray.  Which is desirable.

If we change the Interval #species to not be array, then many things break with Interval - most notably #select: and #collect:, so a major overhaul would be in store for that part of the code.

In line with Bert's allusion, if we removed the super = call, then #= is no longer associative between Interval's and Arrays:
(1 to: 3) = #(1 2 3) "false"
#(1 2 3) = (1 to: 3)" true"

So, I'm just fixing the Interval only part and punting on the issue between Interval and Array for now.

-cbc
 

- Bert -




--
_,,,^..^,,,_
best, Eliot



cbc
Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

cbc
In reply to this post by Eliot Miranda-2
Sorry for the excessive delay in responding to these threads.
On Thu, Nov 15, 2018 at 5:36 PM Eliot Miranda <[hidden email]> wrote:
Hi Bert,

On Thu, Nov 15, 2018 at 4:38 PM Bert Freudenberg <[hidden email]> wrote:
Somehow I missed Eliot's version, but unsurprisingly he had exactly the same idea (use "last" not "stop" for hash). I'd still think bitXor: is preferable to bitOr, that is the standard way in almost all hash methods. But ...

BUT: I forgot about the super fallback in #=. That makes this discussion pretty much moot, because since

#(1 2 3) = (1 to: 3) "true"

is true, this must also be true:

#(1 2 3) hash = (1 to: 3) hash  "must be true"

So the only proper fix IMHO is to remove #hash from Interval (or replace it with ^super hash and a proper comment)


We discussed this a couple of weeks ago.  There is no need for 
    #(1 2 3) = (1 to: 3) 
to be true. 
    #(1 2 3) = #[1 2 3]
isn’t true.  And we have hasEqualElements:.  So a more coherent approach is for the hack that makes intervals equal to arrays be discarded, and the hashes kept distinct.
 
Actually, the hack is that interval is a subclass of SequenceableCollection with species defined as Array. This makes lots of things very nice - like #collect: and #select: just work.  If we removed #species (which would be necessary to make interval and array not be equal), that would require re-implementing these two methods - and many, many more - from the superclasses.

Basically, that hack is a fundamental part of how the class is built today.

Are we ok with us taking on that much of a change?

-cbc
 

- Bert -




--
_,,,^..^,,,_
best, Eliot



Reply | Threaded
Open this post in threaded view
|

Re: #= ==> #hash issues

Eliot Miranda-2
Hi Chris,

On Nov 20, 2018, at 11:11 AM, Chris Cunningham <[hidden email]> wrote:

Sorry for the excessive delay in responding to these threads.
On Thu, Nov 15, 2018 at 5:36 PM Eliot Miranda <[hidden email]> wrote:
Hi Bert,

On Thu, Nov 15, 2018 at 4:38 PM Bert Freudenberg <[hidden email]> wrote:
Somehow I missed Eliot's version, but unsurprisingly he had exactly the same idea (use "last" not "stop" for hash). I'd still think bitXor: is preferable to bitOr, that is the standard way in almost all hash methods. But ...

BUT: I forgot about the super fallback in #=. That makes this discussion pretty much moot, because since

#(1 2 3) = (1 to: 3) "true"

is true, this must also be true:

#(1 2 3) hash = (1 to: 3) hash  "must be true"

So the only proper fix IMHO is to remove #hash from Interval (or replace it with ^super hash and a proper comment)


We discussed this a couple of weeks ago.  There is no need for 
    #(1 2 3) = (1 to: 3) 
to be true. 
    #(1 2 3) = #[1 2 3]
isn’t true.  And we have hasEqualElements:.  So a more coherent approach is for the hack that makes intervals equal to arrays be discarded, and the hashes kept distinct.
 
Actually, the hack is that interval is a subclass of SequenceableCollection with species defined as Array. This makes lots of things very nice - like #collect: and #select: just work.  If we removed #species (which would be necessary to make interval and array not be equal), that would require re-implementing these two methods - and many, many more - from the superclasses.

Basically, that hack is a fundamental part of how the class is built today.

IMO it is not a hack.  But it has nothing to do with whether an Interval with equal elements to an Array is equal to it.  A ByteArray is also a SequenceableCollection and is not equal to an Array if it has equal elements.  It has a different species to Array, but species exists, as you’ve noted, for the convenience of select: & collect: so that immutable collections can answer a suitable mutable class to be used to construct the result.


Are we ok with us taking on that much of a change?

No one is suggesting changing the species of Interval.

-cbc
 

- Bert -




--
_,,,^..^,,,_
best, Eliot




12