true hash

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: true hash

Nicolas Cellier
2012/5/10 Chris Muller <[hidden email]>:

>>> I should add, it's already happened.  In 2009 Levente changed
>>> Object>>#identityHash to answer the scaledIdentityHash.
>>
>> Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays, but identityHash itself is left alone, answering the primitive value directly.
>
> I meant to say Object>>#hash, not #identityHash.
>
> So, before 12/1/2009:
>
>     true hash  "2950"
>
> but after 12/1/2009
>
>     true hash  "773324800"
>
> So, any saved persistent EToys ReferenceStream object-models files
> with true involved in the calculation of #hash prior to 2009 will now
> be goofed up unless you remember to rehash all regular Dictionary's
> after loading it.  The properties of this bug are:
>
>  - it is hidden, you had no idea it was there because no SUnit test
> can possibly catch it.  It didn't show until production.
>  - it is image-specific -- you load the file an image before
> Levente's change and everything seems fine.  What's going on?
>  - it is "intermittent" because there's a small possibility that, if
> the Dictionary were small, you might get lucky with a "hit" anyway
> when calculating the slot to start searching at
>  - it could lead to corrupt data model, because perhaps the app does
> something like #at:ifAbsentPut:, and maybe even on an
> otherwise-equivalent object, so you end up with TWO of the "same"
> object in the dictionary.  What a disaster!
>
> Now does it make sense?
>

No, normally there should be a rehash on reload if I remember... Or we
added one.
Would you check senders?

Nicolas

Reply | Threaded
Open this post in threaded view
|

Re: true hash

Eliot Miranda-2
In reply to this post by Chris Muller-3


On Thu, May 10, 2012 at 10:16 AM, Chris Muller <[hidden email]> wrote:
>> I should add, it's already happened.  In 2009 Levente changed
>> Object>>#identityHash to answer the scaledIdentityHash.
>
> Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays, but identityHash itself is left alone, answering the primitive value directly.

I meant to say Object>>#hash, not #identityHash.

So, before 12/1/2009:

    true hash  "2950"

but after 12/1/2009

    true hash  "773324800"

So, any saved persistent EToys ReferenceStream object-models files
with true involved in the calculation of #hash prior to 2009 will now
be goofed up unless you remember to rehash all regular Dictionary's
after loading it.  The properties of this bug are:

 - it is hidden, you had no idea it was there because no SUnit test
can possibly catch it.  It didn't show until production.
 - it is image-specific -- you load the file an image before
Levente's change and everything seems fine.  What's going on?
 - it is "intermittent" because there's a small possibility that, if
the Dictionary were small, you might get lucky with a "hit" anyway
when calculating the slot to start searching at
 - it could lead to corrupt data model, because perhaps the app does
something like #at:ifAbsentPut:, and maybe even on an
otherwise-equivalent object, so you end up with TWO of the "same"
object in the dictionary.  What a disaster!

Now does it make sense?

No.  One *always* has to rehash on loading binary since one cannot guarantee that identityHashes will be the same in the loading environment as the saving environment.  It s a non-issue. 

--
best,
Eliot



Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Bert Freudenberg
In reply to this post by Eliot Miranda-2

On 10.05.2012, at 19:05, Eliot Miranda wrote:



On Thu, May 10, 2012 at 6:21 AM, Paul DeBruicker <[hidden email]> wrote:
On 05/10/2012 04:21 AM, Nicolas Cellier wrote:
Sure, I already changed various Number>>hash and could as well change
Point hash to follow recommendations from Andres valloud book hashing
in smalltalk...

Nicolas

DateAndTime>>#hash could be changed to :

hash
       ^ (jdn hashMultiply bitXor: seconds + offset asSeconds) bitXor: nanos

which is 130x faster than whats currently in the image:

hash
       ^ self asUTC ticks hash


The collision rate on the proposed hash function is 0.04% ( 4 per 10,000 )

Just doit.  Make that change.  If you don't have commit rights, commit it to inbox.  please!

But take care with rehashing existing sets/collections.

- Bert -




Reply | Threaded
Open this post in threaded view
|

Re: true hash

Bert Freudenberg
In reply to this post by Eliot Miranda-2

On 10.05.2012, at 19:22, Eliot Miranda wrote:



On Thu, May 10, 2012 at 10:16 AM, Chris Muller <[hidden email]> wrote:
>> I should add, it's already happened.  In 2009 Levente changed
>> Object>>#identityHash to answer the scaledIdentityHash.
>
> Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays, but identityHash itself is left alone, answering the primitive value directly.

I meant to say Object>>#hash, not #identityHash.

So, before 12/1/2009:

    true hash  "2950"

but after 12/1/2009

    true hash  "773324800"

So, any saved persistent EToys ReferenceStream object-models files
with true involved in the calculation of #hash prior to 2009 will now
be goofed up unless you remember to rehash all regular Dictionary's
after loading it.  The properties of this bug are:

 - it is hidden, you had no idea it was there because no SUnit test
can possibly catch it.  It didn't show until production.
 - it is image-specific -- you load the file an image before
Levente's change and everything seems fine.  What's going on?
 - it is "intermittent" because there's a small possibility that, if
the Dictionary were small, you might get lucky with a "hit" anyway
when calculating the slot to start searching at
 - it could lead to corrupt data model, because perhaps the app does
something like #at:ifAbsentPut:, and maybe even on an
otherwise-equivalent object, so you end up with TWO of the "same"
object in the dictionary.  What a disaster!

Now does it make sense?

No.  One *always* has to rehash on loading binary since one cannot guarantee that identityHashes will be the same in the loading environment as the saving environment.  It s a non-issue. 

--
best,
Eliot

Yep. See e.g. ImageSegment>>restoreEndianness (which does a bit more than the name suggests).

- Bert -



Reply | Threaded
Open this post in threaded view
|

Re: true hash

Chris Muller-3
Gentlemen.  Models extend outside the image and that's the context
I've been describing the issue from the start of the thread.  The
example for Bert was just a "sub-case" of the more general problem, I
was just trying to be illustrative with a more close-to-home scenario.

Please tell me what is your solution for the case where the legacy
persistent model is a MagmaDictionary hosted on-line with 60M elements
in it -- running right now in production?  rehash is no solution even
if it were possible and feasible, because each of hundreds of
*clients* might have different hash values for true, so the shared
model is being corrupted.  I argue they should be consistent.

I accept the counter-argument that this "probably wouldn't ever
happen" -- but I'm also concerned about the *severity* of the
punishment were it to occur, which I've already laid out.  The
solution is painless, I do not understand your objections..

Smalltalk is breaking out of the local image, expanding into the
network, so we should accept the idea of a universal "value" of true,
not just the true object local to the running image.  For that idea to
be safer, we should NOT continue to depend on true's #identityHash --
just as Bert said was a bad idea.  We should instead allow the
identityHash to vary independently from its #hash, a constant, in case
it needs to again (as it did on 12/1/2009).




On Thu, May 10, 2012 at 12:38 PM, Bert Freudenberg <[hidden email]> wrote:

>
> On 10.05.2012, at 19:22, Eliot Miranda wrote:
>
>
>
> On Thu, May 10, 2012 at 10:16 AM, Chris Muller <[hidden email]> wrote:
>>
>> >> I should add, it's already happened.  In 2009 Levente changed
>> >> Object>>#identityHash to answer the scaledIdentityHash.
>> >
>> > Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays,
>> > but identityHash itself is left alone, answering the primitive value
>> > directly.
>>
>> I meant to say Object>>#hash, not #identityHash.
>>
>> So, before 12/1/2009:
>>
>>     true hash  "2950"
>>
>> but after 12/1/2009
>>
>>     true hash  "773324800"
>>
>> So, any saved persistent EToys ReferenceStream object-models files
>> with true involved in the calculation of #hash prior to 2009 will now
>> be goofed up unless you remember to rehash all regular Dictionary's
>> after loading it.  The properties of this bug are:
>>
>>  - it is hidden, you had no idea it was there because no SUnit test
>> can possibly catch it.  It didn't show until production.
>>  - it is image-specific -- you load the file an image before
>> Levente's change and everything seems fine.  What's going on?
>>  - it is "intermittent" because there's a small possibility that, if
>> the Dictionary were small, you might get lucky with a "hit" anyway
>> when calculating the slot to start searching at
>>  - it could lead to corrupt data model, because perhaps the app does
>> something like #at:ifAbsentPut:, and maybe even on an
>> otherwise-equivalent object, so you end up with TWO of the "same"
>> object in the dictionary.  What a disaster!
>>
>> Now does it make sense?
>
>
> No.  One *always* has to rehash on loading binary since one cannot guarantee
> that identityHashes will be the same in the loading environment as the
> saving environment.  It s a non-issue.
>
> --
> best,
> Eliot
>
>
> Yep. See e.g. ImageSegment>>restoreEndianness (which does a bit more than
> the name suggests).
>
> - Bert -
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Paul DeBruicker
In reply to this post by Eliot Miranda-2
On 05/10/2012 10:05 AM, Eliot Miranda wrote:
> Just doit.  Make that change.  If you don't have commit rights, commit
> it to inbox.  please!


I don't have rights and did save the changes to the inbox as:

Kernel-pad.663.mcz

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Nicolas Cellier
Sorry to duplicate but it breaks this test

| date1 date2 |
date1 := DateAndTime new ticks: (DateAndTime unixEpoch + 1 hours)
ticks offset: 0 hours.
date2 := DateAndTime new ticks: (DateAndTime unixEpoch - 2 hours)
ticks offset: -3 hours.
self assert: (date1 = date2) ==> [date1 hash = date2 hash]

2012/5/10 Paul DeBruicker <[hidden email]>:

> On 05/10/2012 10:05 AM, Eliot Miranda wrote:
>>
>> Just doit.  Make that change.  If you don't have commit rights, commit
>> it to inbox.  please!
>
>
>
> I don't have rights and did save the changes to the inbox as:
>
> Kernel-pad.663.mcz
>

Reply | Threaded
Open this post in threaded view
|

Re: true hash

Nicolas Cellier
In reply to this post by Chris Muller-3
Chris, I understand your concern about possible future evolutions.
But do you realize this change has absolutely no value in a general image?

We simply don't want to preserve hash.
It would be a terrible limitation.
We want a right to make hash method evolve.

The subproblem of nil/true/false identityHash simply does not exist for us.
As long as we hook to the soleInstance on load, and don't forget to
rehash in memory copy, I see no problem in any other applications for
sharing nil/true/false.

For legacy magma, I'd say you have to impose some usage restrictions,
this generally goes along with legacy code.
I suggest you add nil,true,false hash as a magma extension if you
think it will preserve your future (don't forget to rehash in-memory
in post-install-script).
But let me remind you that you also depend on every other hash method
definition, and that's quite fragile.
I'm not really sure every other hash are the same in Squeak and Pharo.
Even less sure that they will evolve jointly.

For future magma, then you have to think...

2012/5/10 Chris Muller <[hidden email]>:

> Gentlemen.  Models extend outside the image and that's the context
> I've been describing the issue from the start of the thread.  The
> example for Bert was just a "sub-case" of the more general problem, I
> was just trying to be illustrative with a more close-to-home scenario.
>
> Please tell me what is your solution for the case where the legacy
> persistent model is a MagmaDictionary hosted on-line with 60M elements
> in it -- running right now in production?  rehash is no solution even
> if it were possible and feasible, because each of hundreds of
> *clients* might have different hash values for true, so the shared
> model is being corrupted.  I argue they should be consistent.
>

SO how can we possibly know the context of your trouble?
Who is responsible for producing/updating client images?
I'd say arrange for having immutable hash in those images...
If you're in control, DIY.
If your clients are in control, emit recommendations and warn them
about the danger of not following these.

Nicolas

> I accept the counter-argument that this "probably wouldn't ever
> happen" -- but I'm also concerned about the *severity* of the
> punishment were it to occur, which I've already laid out.  The
> solution is painless, I do not understand your objections..
>
> Smalltalk is breaking out of the local image, expanding into the
> network, so we should accept the idea of a universal "value" of true,
> not just the true object local to the running image.  For that idea to
> be safer, we should NOT continue to depend on true's #identityHash --
> just as Bert said was a bad idea.  We should instead allow the
> identityHash to vary independently from its #hash, a constant, in case
> it needs to again (as it did on 12/1/2009).
>
>
>
> On Thu, May 10, 2012 at 12:38 PM, Bert Freudenberg <[hidden email]> wrote:
>>
>> On 10.05.2012, at 19:22, Eliot Miranda wrote:
>>
>>
>>
>> On Thu, May 10, 2012 at 10:16 AM, Chris Muller <[hidden email]> wrote:
>>>
>>> >> I should add, it's already happened.  In 2009 Levente changed
>>> >> Object>>#identityHash to answer the scaledIdentityHash.
>>> >
>>> > Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays,
>>> > but identityHash itself is left alone, answering the primitive value
>>> > directly.
>>>
>>> I meant to say Object>>#hash, not #identityHash.
>>>
>>> So, before 12/1/2009:
>>>
>>>     true hash  "2950"
>>>
>>> but after 12/1/2009
>>>
>>>     true hash  "773324800"
>>>
>>> So, any saved persistent EToys ReferenceStream object-models files
>>> with true involved in the calculation of #hash prior to 2009 will now
>>> be goofed up unless you remember to rehash all regular Dictionary's
>>> after loading it.  The properties of this bug are:
>>>
>>>  - it is hidden, you had no idea it was there because no SUnit test
>>> can possibly catch it.  It didn't show until production.
>>>  - it is image-specific -- you load the file an image before
>>> Levente's change and everything seems fine.  What's going on?
>>>  - it is "intermittent" because there's a small possibility that, if
>>> the Dictionary were small, you might get lucky with a "hit" anyway
>>> when calculating the slot to start searching at
>>>  - it could lead to corrupt data model, because perhaps the app does
>>> something like #at:ifAbsentPut:, and maybe even on an
>>> otherwise-equivalent object, so you end up with TWO of the "same"
>>> object in the dictionary.  What a disaster!
>>>
>>> Now does it make sense?
>>
>>
>> No.  One *always* has to rehash on loading binary since one cannot guarantee
>> that identityHashes will be the same in the loading environment as the
>> saving environment.  It s a non-issue.
>>
>> --
>> best,
>> Eliot
>>
>>
>> Yep. See e.g. ImageSegment>>restoreEndianness (which does a bit more than
>> the name suggests).
>>
>> - Bert -
>>
>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: true hash

Bert Freudenberg
In reply to this post by Chris Muller-3
On 10.05.2012, at 21:15, Chris Muller wrote:

> Gentlemen.  Models extend outside the image and that's the context
> I've been describing the issue from the start of the thread.  The
> example for Bert was just a "sub-case" of the more general problem, I
> was just trying to be illustrative with a more close-to-home scenario.
>
> Please tell me what is your solution for the case where the legacy
> persistent model is a MagmaDictionary hosted on-line with 60M elements
> in it -- running right now in production?  rehash is no solution even
> if it were possible and feasible, because each of hundreds of
> *clients* might have different hash values for true, so the shared
> model is being corrupted.  I argue they should be consistent.
>
> I accept the counter-argument that this "probably wouldn't ever
> happen" -- but I'm also concerned about the *severity* of the
> punishment were it to occur, which I've already laid out.  The
> solution is painless, I do not understand your objections..
>
> Smalltalk is breaking out of the local image, expanding into the
> network, so we should accept the idea of a universal "value" of true,
> not just the true object local to the running image.  For that idea to
> be safer, we should NOT continue to depend on true's #identityHash --
> just as Bert said was a bad idea.  We should instead allow the
> identityHash to vary independently from its #hash, a constant, in case
> it needs to again (as it did on 12/1/2009).


Nothing I can think of depends on the actual value of the hash of true. All that is needed is that in a given image, it is constant, since true is a constant.

I must still be missing something. "Models extend outside the image", true. But leaking implementation detail into the external model is just bad design. And the hash of an object is very much an implementation detail, IMHO.

I think what you are saying is, that instead of leaving that work to the server, clients in Magma perform some index calculation on their own, based on the object's hash, and send that index to the server? Or they rely on iterating order in the dictionary to be the same on the server? Something like that?

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Nicolas Cellier
In reply to this post by Nicolas Cellier
Maybe something like this

hash
        | totalSeconds |
        totalSeconds := seconds - offset asSeconds.
        ^ ((totalSeconds // 86400 + jdn) hashMultiply bitXor: totalSeconds \\
86400) bitXor: nanos

Nicolas

2012/5/10 Nicolas Cellier <[hidden email]>:

> Sorry to duplicate but it breaks this test
>
> | date1 date2 |
> date1 := DateAndTime new ticks: (DateAndTime unixEpoch + 1 hours)
> ticks offset: 0 hours.
> date2 := DateAndTime new ticks: (DateAndTime unixEpoch - 2 hours)
> ticks offset: -3 hours.
> self assert: (date1 = date2) ==> [date1 hash = date2 hash]
>
> 2012/5/10 Paul DeBruicker <[hidden email]>:
>> On 05/10/2012 10:05 AM, Eliot Miranda wrote:
>>>
>>> Just doit.  Make that change.  If you don't have commit rights, commit
>>> it to inbox.  please!
>>
>>
>>
>> I don't have rights and did save the changes to the inbox as:
>>
>> Kernel-pad.663.mcz
>>

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Randal L. Schwartz
In reply to this post by Nicolas Cellier
>>>>> "Nicolas" == Nicolas Cellier <[hidden email]> writes:

Nicolas> Sorry to duplicate but it breaks this test
Nicolas> | date1 date2 |
Nicolas> date1 := DateAndTime new ticks: (DateAndTime unixEpoch + 1 hours)
Nicolas> ticks offset: 0 hours.
Nicolas> date2 := DateAndTime new ticks: (DateAndTime unixEpoch - 2 hours)
Nicolas> ticks offset: -3 hours.
Nicolas> self assert: (date1 = date2) ==> [date1 hash = date2 hash]

Isn't that test broken?  What if the first and second call to unixEpoch
differ by one second?  Or is that handled in the assert?

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion

Reply | Threaded
Open this post in threaded view
|

Re: true hash

Chris Muller-3
In reply to this post by Nicolas Cellier
Ok, I can understand that.  It is so very very far from being an issue
in most folks' projects, so there's no value for my proposal to be
"harmless" if already free from harm.  It becomes seen as a
unnecessary change that potentially closes some openness.  Fair
enough.

It's even far for me, but not as far, so I'll just try it in at my-app
level (Magma) for now.



On Thu, May 10, 2012 at 4:02 PM, Nicolas Cellier
<[hidden email]> wrote:

> Chris, I understand your concern about possible future evolutions.
> But do you realize this change has absolutely no value in a general image?
>
> We simply don't want to preserve hash.
> It would be a terrible limitation.
> We want a right to make hash method evolve.
>
> The subproblem of nil/true/false identityHash simply does not exist for us.
> As long as we hook to the soleInstance on load, and don't forget to
> rehash in memory copy, I see no problem in any other applications for
> sharing nil/true/false.
>
> For legacy magma, I'd say you have to impose some usage restrictions,
> this generally goes along with legacy code.
> I suggest you add nil,true,false hash as a magma extension if you
> think it will preserve your future (don't forget to rehash in-memory
> in post-install-script).
> But let me remind you that you also depend on every other hash method
> definition, and that's quite fragile.
> I'm not really sure every other hash are the same in Squeak and Pharo.
> Even less sure that they will evolve jointly.
>
> For future magma, then you have to think...
>
> 2012/5/10 Chris Muller <[hidden email]>:
>> Gentlemen.  Models extend outside the image and that's the context
>> I've been describing the issue from the start of the thread.  The
>> example for Bert was just a "sub-case" of the more general problem, I
>> was just trying to be illustrative with a more close-to-home scenario.
>>
>> Please tell me what is your solution for the case where the legacy
>> persistent model is a MagmaDictionary hosted on-line with 60M elements
>> in it -- running right now in production?  rehash is no solution even
>> if it were possible and feasible, because each of hundreds of
>> *clients* might have different hash values for true, so the shared
>> model is being corrupted.  I argue they should be consistent.
>>
>
> SO how can we possibly know the context of your trouble?
> Who is responsible for producing/updating client images?
> I'd say arrange for having immutable hash in those images...
> If you're in control, DIY.
> If your clients are in control, emit recommendations and warn them
> about the danger of not following these.
>
> Nicolas
>
>> I accept the counter-argument that this "probably wouldn't ever
>> happen" -- but I'm also concerned about the *severity* of the
>> punishment were it to occur, which I've already laid out.  The
>> solution is painless, I do not understand your objections..
>>
>> Smalltalk is breaking out of the local image, expanding into the
>> network, so we should accept the idea of a universal "value" of true,
>> not just the true object local to the running image.  For that idea to
>> be safer, we should NOT continue to depend on true's #identityHash --
>> just as Bert said was a bad idea.  We should instead allow the
>> identityHash to vary independently from its #hash, a constant, in case
>> it needs to again (as it did on 12/1/2009).
>>
>>
>>
>> On Thu, May 10, 2012 at 12:38 PM, Bert Freudenberg <[hidden email]> wrote:
>>>
>>> On 10.05.2012, at 19:22, Eliot Miranda wrote:
>>>
>>>
>>>
>>> On Thu, May 10, 2012 at 10:16 AM, Chris Muller <[hidden email]> wrote:
>>>>
>>>> >> I should add, it's already happened.  In 2009 Levente changed
>>>> >> Object>>#identityHash to answer the scaledIdentityHash.
>>>> >
>>>> > Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays,
>>>> > but identityHash itself is left alone, answering the primitive value
>>>> > directly.
>>>>
>>>> I meant to say Object>>#hash, not #identityHash.
>>>>
>>>> So, before 12/1/2009:
>>>>
>>>>     true hash  "2950"
>>>>
>>>> but after 12/1/2009
>>>>
>>>>     true hash  "773324800"
>>>>
>>>> So, any saved persistent EToys ReferenceStream object-models files
>>>> with true involved in the calculation of #hash prior to 2009 will now
>>>> be goofed up unless you remember to rehash all regular Dictionary's
>>>> after loading it.  The properties of this bug are:
>>>>
>>>>  - it is hidden, you had no idea it was there because no SUnit test
>>>> can possibly catch it.  It didn't show until production.
>>>>  - it is image-specific -- you load the file an image before
>>>> Levente's change and everything seems fine.  What's going on?
>>>>  - it is "intermittent" because there's a small possibility that, if
>>>> the Dictionary were small, you might get lucky with a "hit" anyway
>>>> when calculating the slot to start searching at
>>>>  - it could lead to corrupt data model, because perhaps the app does
>>>> something like #at:ifAbsentPut:, and maybe even on an
>>>> otherwise-equivalent object, so you end up with TWO of the "same"
>>>> object in the dictionary.  What a disaster!
>>>>
>>>> Now does it make sense?
>>>
>>>
>>> No.  One *always* has to rehash on loading binary since one cannot guarantee
>>> that identityHashes will be the same in the loading environment as the
>>> saving environment.  It s a non-issue.
>>>
>>> --
>>> best,
>>> Eliot
>>>
>>>
>>> Yep. See e.g. ImageSegment>>restoreEndianness (which does a bit more than
>>> the name suggests).
>>>
>>> - Bert -
>>>
>>>
>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: true hash

Chris Muller-3
In reply to this post by Bert Freudenberg
> I must still be missing something. "Models extend outside the image", true. But leaking implementation detail into the external model is just bad design. And the hash of an object is very much an implementation detail, IMHO.

I wouldn't call it bad design -- but I do agree hashes are a bit more
dicey as we see someone just discovering a DateAndTime hash
improvement.  Great!

However, hash improvements are generally few and far between, with a
little care the rewards of higher transparency can still be had.  Like
ReferenceStream, Magma "solves" it by rehashing standard Dictionary's
on materialization, but some of the special hashed collections types
this is either not supported or is impractical.

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Paul DeBruicker
In reply to this post by Nicolas Cellier
On 05/10/2012 02:18 PM, Nicolas Cellier wrote:
> Maybe something like this
>
> hash
> | totalSeconds |
> totalSeconds := seconds - offset asSeconds.
> ^ ((totalSeconds // 86400 + jdn) hashMultiply bitXor: totalSeconds \\
> 86400) bitXor: nanos
>
> Nicolas


That does well too.  Its better than what I proposed as far as
collisions go but a little bit slower on my machine.  It has ~1
collision per 10,000 and is 100x faster than the original hash method.


Your test doesn't fail with the hash I proposed on Pharo-1.3 but
definitely fails on Squeak4.3.


I'll put up and amended version in the inbox with your hash and modify
the Pharo issue.


Also in Squeak 4.3 these test fail after the hash change because they
test that the hash is a specific SmallInteger

DateAndTimeEpochTest>>#testHash
DateAndTimeLeapTest>>#testHash
TimespanTest>>#testHash

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash

Bert Freudenberg
In reply to this post by Randal L. Schwartz
On 11.05.2012, at 00:15, Randal L. Schwartz wrote:

> What if the first and second call to unixEpoch differ by one second?


The unix epoch is a constant.  

(Can't believe I get to educate Randal on Unixy things. Yay!)

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash

David T. Lewis
On Fri, May 11, 2012 at 12:01:18PM +0200, Bert Freudenberg wrote:
> On 11.05.2012, at 00:15, Randal L. Schwartz wrote:
>
> > What if the first and second call to unixEpoch differ by one second?
>
>
> The unix epoch is a constant.  

FYI, in the interpreter VM a primitive is available for directly accessing
time since the unix epoch, and can be accessed as follows:

Time class>>primUtcWithOffset
        "Answer an array with UTC microseconds since the Posix epoch and
        the current seconds offset from GMT in the local time zone."

        "Time primUtcWithOffset"

        <primitive: 'primitiveUtcWithOffset'>
        ^nil


>
> (Can't believe I get to educate Randal on Unixy things. Yay!)
>
> - Bert -
>

A significant milestone indeed ;)

Dave


Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash

Randal L. Schwartz
In reply to this post by Bert Freudenberg
>>>>> "Bert" == Bert Freudenberg <[hidden email]> writes:

Bert> On 11.05.2012, at 00:15, Randal L. Schwartz wrote:
>> What if the first and second call to unixEpoch differ by one second?


Bert> The unix epoch is a constant.  

Bert> (Can't believe I get to educate Randal on Unixy things. Yay!)

Ahh, thought it was a call to return the current time in unix-epochy values.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[hidden email]> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Chris Muller-3
In reply to this post by Paul DeBruicker
DateAndTime>>#hash has been a bane of (poor) performance for many
years, especially for non-UTC dates.  This is so excellent Paul --
thanks a ton!


On Thu, May 10, 2012 at 8:21 AM, Paul DeBruicker <[hidden email]> wrote:

> On 05/10/2012 04:21 AM, Nicolas Cellier wrote:
>>
>> Sure, I already changed various Number>>hash and could as well change
>> Point hash to follow recommendations from Andres valloud book hashing
>> in smalltalk...
>>
>> Nicolas
>
>
> DateAndTime>>#hash could be changed to :
>
> hash
>        ^ (jdn hashMultiply bitXor: seconds + offset asSeconds) bitXor: nanos
>
> which is 130x faster than whats currently in the image:
>
> hash
>        ^ self asUTC ticks hash
>
>
> The collision rate on the proposed hash function is 0.04% ( 4 per 10,000 )
>

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Andres Valloud-4
In reply to this post by Paul DeBruicker
When asserting speedups and collision rates, please also mention what
was the dataset used for the measurements.  In that way, others can
replicate the results and compare hash functions on a consistent basis.

On 5/10/12 6:21 , Paul DeBruicker wrote:

> On 05/10/2012 04:21 AM, Nicolas Cellier wrote:
>> Sure, I already changed various Number>>hash and could as well change
>> Point hash to follow recommendations from Andres valloud book hashing
>> in smalltalk...
>>
>> Nicolas
>
> DateAndTime>>#hash could be changed to :
>
> hash
> ^ (jdn hashMultiply bitXor: seconds + offset asSeconds) bitXor: nanos
>
> which is 130x faster than whats currently in the image:
>
> hash
> ^ self asUTC ticks hash
>
>
> The collision rate on the proposed hash function is 0.04% ( 4 per 10,000 )
>
>

Reply | Threaded
Open this post in threaded view
|

Re: DateAndTime hash was: Re: [squeak-dev] true hash

Levente Uzonyi-2
On Sat, 12 May 2012, Andres Valloud wrote:

> When asserting speedups and collision rates, please also mention what was the
> dataset used for the measurements.  In that way, others can replicate the
> results and compare hash functions on a consistent basis.

+1. I started hacking it a bit and got another ~2x speedup on my
benchmark, but it was too simple and I didn't have the time to dig deeper.


Levente

>
> On 5/10/12 6:21 , Paul DeBruicker wrote:
>> On 05/10/2012 04:21 AM, Nicolas Cellier wrote:
>>> Sure, I already changed various Number>>hash and could as well change
>>> Point hash to follow recommendations from Andres valloud book hashing
>>> in smalltalk...
>>>
>>> Nicolas
>>
>> DateAndTime>>#hash could be changed to :
>>
>> hash
>> ^ (jdn hashMultiply bitXor: seconds + offset asSeconds) bitXor: nanos
>>
>> which is 130x faster than whats currently in the image:
>>
>> hash
>> ^ self asUTC ticks hash
>>
>>
>> The collision rate on the proposed hash function is 0.04% ( 4 per 10,000 )
>>
>>
>
>

123