Smalltalk › Squeak › Squeak - Dev

true hash

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

43 messages Options

123

Chris Muller-3

true hash

Where do true, false and nil obtain their hash value? They inherit
#hash from Object, so it is their identityHash, but I noticed this is
consistent between images -- how? It's great, but is there any danger
of that value ever changing? That would be bad..

Bert Freudenberg

Re: true hash

On 09.05.2012, at 21:28, Chris Muller wrote:

> Where do true, false and nil obtain their hash value? They inherit
> #hash from Object, so it is their identityHash, but I noticed this is
> consistent between images -- how? It's great, but is there any danger
> of that value ever changing? That would be bad..

The identity hash bits are stored in each object's header. And since true, false, nil are the same decades old instances, their hash did not change.

Depending on what the SystemTracer does, it may be different in an image derived by that though. E.g. you may want to check a 64 bit image.

- Bert -

Chris Muller-3

Re: true hash

>> Where do true, false and nil obtain their hash value? They inherit
>> #hash from Object, so it is their identityHash, but I noticed this is
>> consistent between images -- how? It's great, but is there any danger
>> of that value ever changing? That would be bad..
>
>
> The identity hash bits are stored in each object's header. And since true, false, nil are the same decades old instances, their hash did not change.
>
> Depending on what the SystemTracer does, it may be different in an image derived by that though. E.g. you may want to check a 64 bit image.

So, just in thinking about it -- that is a VERY distant dependency
that could manifest as a bug way up in at an app-level in production.
Because no SUnit test would be able to catch it, and when trying to
debug it in production, invariably someone with the "classic" hashes
not be able to reproduce the problem.. Hopefully it would only be a
"lookup" problem but what if it was in the context of an
#at:ifAbsentPut:? What a nightmare!

So, it would seem a good idea to override #hash to return their
current value as a fixed constant.

Thoughts?

Bert Freudenberg

Re: true hash

On 09.05.2012, at 22:19, Chris Muller wrote:

>>> Where do true, false and nil obtain their hash value? They inherit
>>> #hash from Object, so it is their identityHash, but I noticed this is
>>> consistent between images -- how? It's great, but is there any danger
>>> of that value ever changing? That would be bad..
>>
>>
>> The identity hash bits are stored in each object's header. And since true, false, nil are the same decades old instances, their hash did not change.
>>
>> Depending on what the SystemTracer does, it may be different in an image derived by that though. E.g. you may want to check a 64 bit image.
>
> So, just in thinking about it -- that is a VERY distant dependency
> that could manifest as a bug way up in at an app-level in production.
> Because no SUnit test would be able to catch it, and when trying to
> debug it in production, invariably someone with the "classic" hashes
> not be able to reproduce the problem.. Hopefully it would only be a
> "lookup" problem but what if it was in the context of an
> #at:ifAbsentPut:? What a nightmare!
>
> So, it would seem a good idea to override #hash to return their
> current value as a fixed constant.
>
> Thoughts?

-1

Why would you depend on the exact value of the identity hash?

When writing a new image and the identity hash changed, then obviously all dictionaries would have to be rehashed. Seems like a non-issue to me.

Besides, YAGNI. If and when you need that you could still add these methods. Though you wont ;^)

- Bert -

Chris Muller-3

Re: true hash

> Why would you depend on the exact value of the identity hash?

Not the value of the identityHash, the value of the hash.

> When writing a new image and the identity hash changed, then obviously all dictionaries would have to be rehashed. Seems like a non-issue to me.

That's the exact problem I want to avoid. Think in the context of a
multiuser client-server app accessing the same large persistent domain
model. The domain model includes a complex domain object used as a
key in a dictionary, and one of the attributes used to determine its
hash is one of its boolean attributes.

Now, some of the clients suddenly happen to get a new #identityHash
for true, which in turn changes its #hash. Now they cannot access the
domain's whose hash depends on true or false. Worse, they might
#at:put: into it and then the model is "corrupted" because there is no
universally-consistent notion of true's hash across all client images.

You're right, I could add my own private extensions, but we should
consider having "universal" atomics given the the aforementioned
insidiousness of the above situation..?

Eliot Miranda-2

Re: true hash

On Wed, May 9, 2012 at 1:46 PM, Chris Muller <[hidden email]> wrote:

> Why would you depend on the exact value of the identity hash?

Not the value of the identityHash, the value of the hash.

> When writing a new image and the identity hash changed, then obviously all dictionaries would have to be rehashed. Seems like a non-issue to me.

That's the exact problem I want to avoid. Think in the context of a
multiuser client-server app accessing the same large persistent domain
model. The domain model includes a complex domain object used as a
key in a dictionary, and one of the attributes used to determine its
hash is one of its boolean attributes.

Now, some of the clients suddenly happen to get a new #identityHash
for true, which in turn changes its #hash. Now they cannot access the
domain's whose hash depends on true or false. Worse, they might
#at:put: into it and then the model is "corrupted" because there is no
universally-consistent notion of true's hash across all client images.

How so? Given that within an image the hash and dictionaries hashed there-on are consistent how does it matter if two different images have different hashes and different hash values? How do these clients suddenly acquire new hashes for nil, true and false? Smalltalk systems have been like this for many decades and have not found this to be a significant problem in practice. There ca be performance advantages with consistent hashes (e.g. of symbols, avoiding having to rehash method dictionaries on code load). But no fundamental problem.

BTW, there are even lisp systems that use an object's identity as its hash /and/ have moving garbage collectors so that all identity-hashed collections are rehashed after gGC. These systems keep working too, even though an objects' hash changes through its lifetime, let alone differs between systems.

You're right, I could add my own private extensions, but we should
consider having "universal" atomics given the the aforementioned
insidiousness of the above situation..?

--
best,

Eliot

Nicolas Cellier

Re: true hash

In reply to this post by Chris Muller-3

2012/5/9 Chris Muller <[hidden email]>:

>> Why would you depend on the exact value of the identity hash?
>
> Not the value of the identityHash, the value of the hash.
>
>> When writing a new image and the identity hash changed, then obviously all dictionaries would have to be rehashed. Seems like a non-issue to me.
>
> That's the exact problem I want to avoid. Think in the context of a
> multiuser client-server app accessing the same large persistent domain
> model. The domain model includes a complex domain object used as a
> key in a dictionary, and one of the attributes used to determine its
> hash is one of its boolean attributes.
>
> Now, some of the clients suddenly happen to get a new #identityHash
> for true, which in turn changes its #hash. Now they cannot access the
> domain's whose hash depends on true or false. Worse, they might
> #at:put: into it and then the model is "corrupted" because there is no
> universally-consistent notion of true's hash across all client images.
>
> You're right, I could add my own private extensions, but we should
> consider having "universal" atomics given the the aforementioned
> insidiousness of the above situation..?
>

If this can only occur in this persistence scheme, then you obviously
require a #persistentHash.

Object>>persistentHash ^self hash
True,False,UndefinedObject>>persistentHash ^some specific constant...

It is true that any other literal, arrays of such, or any hash-caring
object built of such would be cross-image hash-persistent...
So we are very close to it.

But YAGNI, your hash sounds hackish...
Maybe your own image tracer could implement the hack too ?

Nicolas

David T. Lewis

Re: true hash

In reply to this post by Bert Freudenberg

On Wed, May 09, 2012 at 09:57:16PM +0200, Bert Freudenberg wrote:

> On 09.05.2012, at 21:28, Chris Muller wrote:
>
> > Where do true, false and nil obtain their hash value? They inherit
> > #hash from Object, so it is their identityHash, but I noticed this is
> > consistent between images -- how? It's great, but is there any danger
> > of that value ever changing? That would be bad..
>
>
> The identity hash bits are stored in each object's header. And since true, false, nil are the same decades old instances, their hash did not change.
>
> Depending on what the SystemTracer does, it may be different in an image derived by that though. E.g. you may want to check a 64 bit image.
>
> - Bert -

FWIW, on a 32-bit image:

ImageFormat thisImageFileFormat asInteger ==> 6504
Smalltalk wordSize ==> 4
nil identityHash ==> 3840
true identityHash ==> 2950
false identityHash ==> 3152

And on a 64-bit image:

ImageFormat thisImageFileFormat asInteger ==> 68002
Smalltalk wordSize ==> 8
nil identityHash ==> 3840
true identityHash ==> 2950
false identityHash ==> 3152

Dave

Chris Muller-3

Re: true hash

In reply to this post by Eliot Miranda-2

> How so? Given that within an image the hash and dictionaries hashed
> there-on are consistent how does it matter if two different images have
> different hashes and different hash values?

As I tried to explain, it matters in the case where the two different
images are accessing a persistent, legacy domain model which has the
true object involved in the calculation of a hash value.

It wouldn't even necessarily have to be a client-server app -- maybe
the "persistent model" is just a serialized object file that the "new"
(inconsistent) wanted to load.

The way it is now, the system is dependent solely on the identityHash
of true and false and nil, even though I want them to be treated as
equivalent "value" objects just like Integers would be..

Chris Muller-3

Re: true hash

In reply to this post by Nicolas Cellier

> If this can only occur in this persistence scheme, then you obviously
> require a #persistentHash.

No, just the standard "value" #hash is sufficient. A #persistentHash
wouldn't work in the use-case I described because what if they used a
standard Dictionary which bases on #hash -- so the client with the
"new" true wouldn't be able to access.

> It is true that any other literal, arrays of such, or any hash-caring
> object built of such would be cross-image hash-persistent...
> So we are very close to it.

So why opposed to including true, false, and nil then?

> But YAGNI, your hash sounds hackish...
> Maybe your own image tracer could implement the hack too ?

If anything, I see what we have now as a hack -- because the
correctness of #hash for the **universal *value* of true** is
dependent on the *implementation* of that true -- that it is the same
one ever created in all prior Squeak's..

- Chris

Chris Muller-3

Re: true hash

In reply to this post by David T. Lewis

> FWIW, on a 32-bit image:
>
> ImageFormat thisImageFileFormat asInteger ==> 6504
> Smalltalk wordSize ==> 4
> nil identityHash ==> 3840
> true identityHash ==> 2950
> false identityHash ==> 3152
>
> And on a 64-bit image:
>
> ImageFormat thisImageFileFormat asInteger ==> 68002
> Smalltalk wordSize ==> 8
> nil identityHash ==> 3840
> true identityHash ==> 2950
> false identityHash ==> 3152

This is very good news -- still, I see no harm in my proposal. Why
won't someone find some fault with it or at least acknowledge how
horrible the failure-case scenario would be to debug.. :)

Chris Muller-3

Re: true hash

In reply to this post by Eliot Miranda-2

> How do these clients suddenly
> acquire new hashes for nil, true and false?

I should add, it's already happened. In 2009 Levente changed
Object>>#identityHash to answer the scaledIdentityHash.

David T. Lewis

Re: true hash

In reply to this post by Chris Muller-3

On Wed, May 09, 2012 at 06:33:41PM -0500, Chris Muller wrote:

> > FWIW, on a 32-bit image:
> >
> > ??ImageFormat thisImageFileFormat asInteger ==> 6504
> > ??Smalltalk wordSize ==> 4
> > ??nil identityHash ==> 3840
> > ??true identityHash ==> 2950
> > ??false identityHash ==> 3152
> >
> > And on a 64-bit image:
> >
> > ??ImageFormat thisImageFileFormat asInteger ==> 68002
> > ??Smalltalk wordSize ==> 8
> > ??nil identityHash ==> 3840
> > ??true identityHash ==> 2950
> > ??false identityHash ==> 3152
>
> This is very good news -- still, I see no harm in my proposal. Why
> won't someone find some fault with it or at least acknowledge how
> horrible the failure-case scenario would be to debug.. :)

Nothing horrible is going to happen any time soon, but that does
not make it a good idea. true refers to an object like any other,
and there is no particular reason to expect that the object that
represents "true" in one image should have the same identityHash
as the object that represents "true" in another image.

Consider your multi-user client-server application example. Suppose
that it becomes fabulously successful and scales effortlessly to
support thousands of clients, and you later become interested in
permitting VisualWorks client images to join the party. Oops.

Dave

Nicolas Cellier

Re: true hash

In reply to this post by Chris Muller-3

2012/5/10 Chris Muller <[hidden email]>:
>> How do these clients suddenly
>> acquire new hashes for nil, true and false?
>
> I should add, it's already happened. In 2009 Levente changed
> Object>>#identityHash to answer the scaledIdentityHash.
>

If I had to choose arbitrary constants that would be something stupid like
^36r0true
^36r0false
^36r0nil
but it would cost you a rehash of persistent databases...

Bert Freudenberg

Re: true hash

In reply to this post by Chris Muller-3

On 10.05.2012, at 01:40, Chris Muller wrote:

>> How do these clients suddenly
>> acquire new hashes for nil, true and false?
>
> I should add, it's already happened. In 2009 Levente changed
> Object>>#identityHash to answer the scaledIdentityHash.

Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays, but identityHash itself is left alone, answering the primitive value directly.

You may be thinking of Pharo, where you now need to use basicIdentityHash to get the primitive hash, and identityHash answers the more useful scaled value. Be aware though that Pharo also manipulates SmallInteger hashes where Squeak doesn't (yet, anyway).

Did I mention it's a bad idea to depend on actual hash values? ;)

- Bert -

Nicolas Cellier

Re: true hash

2012/5/10 Bert Freudenberg <[hidden email]>:

>
> On 10.05.2012, at 01:40, Chris Muller wrote:
>
>>> How do these clients suddenly
>>> acquire new hashes for nil, true and false?
>>
>> I should add, it's already happened. In 2009 Levente changed
>> Object>>#identityHash to answer the scaledIdentityHash.
>
> Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays, but identityHash itself is left alone, answering the primitive value directly.
>
> You may be thinking of Pharo, where you now need to use basicIdentityHash to get the primitive hash, and identityHash answers the more useful scaled value. Be aware though that Pharo also manipulates SmallInteger hashes where Squeak doesn't (yet, anyway).
>
> Did I mention it's a bad idea to depend on actual hash values? ;)
>
> - Bert -
>

Sure, I already changed various Number>>hash and could as well change
Point hash to follow recommendations from Andres valloud book hashing
in smalltalk...

Nicolas

David T. Lewis

Re: true hash

In reply to this post by Bert Freudenberg

On Thu, May 10, 2012 at 12:41:13PM +0200, Bert Freudenberg wrote:

>
> On 10.05.2012, at 01:40, Chris Muller wrote:
>
> >> How do these clients suddenly
> >> acquire new hashes for nil, true and false?
> >
> > I should add, it's already happened. In 2009 Levente changed
> > Object>>#identityHash to answer the scaledIdentityHash.
>
> Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays, but identityHash itself is left alone, answering the primitive value directly.
>
> You may be thinking of Pharo, where you now need to use basicIdentityHash to get the primitive hash, and identityHash answers the more useful scaled value. Be aware though that Pharo also manipulates SmallInteger hashes where Squeak doesn't (yet, anyway).
>
> Did I mention it's a bad idea to depend on actual hash values? ;)
>
> - Bert -

Squeak:
nil identityHash ==> 3840
true identityHash ==> 2950
false identityHash ==> 3152

Pharo:
nil identityHash ==> 1006632960
true identityHash ==> 773324800
false identityHash ==> 826277888
nil basicIdentityHash ==> 3840
true basicIdentityHash ==> 2950
false basicIdentityHash ==> 3152

Paul DeBruicker

DateAndTime hash was: Re: [squeak-dev] true hash

In reply to this post by Nicolas Cellier

On 05/10/2012 04:21 AM, Nicolas Cellier wrote:
> Sure, I already changed various Number>>hash and could as well change
> Point hash to follow recommendations from Andres valloud book hashing
> in smalltalk...
>
> Nicolas

DateAndTime>>#hash could be changed to :

hash
^ (jdn hashMultiply bitXor: seconds + offset asSeconds) bitXor: nanos

which is 130x faster than whats currently in the image:

hash
^ self asUTC ticks hash

The collision rate on the proposed hash function is 0.04% ( 4 per 10,000 )

Eliot Miranda-2

Re: DateAndTime hash was: Re: [squeak-dev] true hash

On Thu, May 10, 2012 at 6:21 AM, Paul DeBruicker <[hidden email]> wrote:

On 05/10/2012 04:21 AM, Nicolas Cellier wrote:

Sure, I already changed various Number>>hash and could as well change
Point hash to follow recommendations from Andres valloud book hashing
in smalltalk...

Nicolas

DateAndTime>>#hash could be changed to :

hash
^ (jdn hashMultiply bitXor: seconds + offset asSeconds) bitXor: nanos

which is 130x faster than whats currently in the image:

hash
^ self asUTC ticks hash

The collision rate on the proposed hash function is 0.04% ( 4 per 10,000 )

Just doit. Make that change. If you don't have commit rights, commit it to inbox. please!

--
best,

Eliot

Chris Muller-3

Re: true hash

In reply to this post by Bert Freudenberg

>> I should add, it's already happened. In 2009 Levente changed
>> Object>>#identityHash to answer the scaledIdentityHash.
>
> Not in Squeak. Our IdentityDictionary uses scaledIdentityHash nowadays, but identityHash itself is left alone, answering the primitive value directly.

I meant to say Object>>#hash, not #identityHash.

So, before 12/1/2009:

true hash "2950"

but after 12/1/2009

true hash "773324800"

So, any saved persistent EToys ReferenceStream object-models files
with true involved in the calculation of #hash prior to 2009 will now
be goofed up unless you remember to rehash all regular Dictionary's
after loading it. The properties of this bug are:

- it is hidden, you had no idea it was there because no SUnit test
can possibly catch it. It didn't show until production.
- it is image-specific -- you load the file an image before
Levente's change and everything seems fine. What's going on?
- it is "intermittent" because there's a small possibility that, if
the Dictionary were small, you might get lucky with a "hit" anyway
when calculating the slot to start searching at
- it could lead to corrupt data model, because perhaps the app does
something like #at:ifAbsentPut:, and maybe even on an
otherwise-equivalent object, so you end up with TWO of the "same"
object in the dictionary. What a disaster!

Now does it make sense?

123