Why #identityHash does not rely on OOP?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Why #identityHash does not rely on OOP?

GLASS mailing list
I am sure the answer is dumb, but I still wonder. At a first glance I thought it could be because you may reuse OOPs. But...if you re-use a OOP is because that object is not anywhere in the system (and hence, there is no need of re-hash any collection). 

Another thought I had is none-persisting object and the costs of getting available OOPs vs a simply #identityHash implementation. 

BTW how long is the #identityHash? In Pharo (before Spur) we had 12 bits (or similar) in the object header so that yield 4k different values. That's not fun when you have large identity-based hashed collection because of collisions.  Anyway, how big is the identity hash in Gemstone? Or how "unique" it is?

Anyway, I was simply curious about this topic and would like you know your thoughts.


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Why #identityHash does not rely on OOP?

GLASS mailing list
On 10/26/2016 07:51 AM, Mariano Martinez Peck via Glass wrote:

> I am sure the answer is dumb, but I still wonder. At a first glance I
> thought it could be because you may reuse OOPs. But...if you re-use a
> OOP is because that object is not anywhere in the system (and hence,
> there is no need of re-hash any collection).
>
> Another thought I had is none-persisting object and the costs of getting
> available OOPs vs a simply #identityHash implementation.
>
> BTW how long is the #identityHash? In Pharo (before Spur) we had 12 bits
> (or similar) in the object header so that yield 4k different values.
> That's not fun when you have large identity-based hashed collection
> because of collisions.  Anyway, how big is the identity hash in
> Gemstone? Or how "unique" it is?
>
> Anyway, I was simply curious about this topic and would like you know
> your thoughts.
>

Hi Mariano,

In GemStone, identity hash is based on the oop. Try looking at an
object's oop and its identity hash in hex. There is an identity hash for
each possible persistent object, so hash collisions between persistent
objects cannot happen. It is possible to have, for instance, a
SmallInteger and a regular object with the same identity hash, since a
SmallInteger's identity hash is itself.

For all types of hashed collections you want to avoid collisions.
GemStone is better at this than most other Smalltalks since there is a
unique hash per object (excepting immediate objects like SmallIntegers).

However, for some kinds of hashed collections (especially open
addressing with linear probing) you also need to avoid "clumping" of
hash values. What you need are hash values that look random even when
the actual keys are not. For SmallIntegers, most Smalltalks make the
same mistake -- the hash values of 1 2 3 4 5 are 1 2 3 4 5, when you'd
want those hashes to be spread all over the possible range. And oops in
GemStone have patterns, too -- they're far from random. So if you're
implementing a hashed collection, it's a good idea to put the hashes of
the objects through a scrambling function before using them. See
#hashMultiply in Pharo for an example scrambling function.

Regards,

-Martin
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Why #identityHash does not rely on OOP?

GLASS mailing list
In reply to this post by GLASS mailing list

Gs doesn't have the problems common in 32-bit versions of other dialects related to collisions from short #identityHash range. I recall the 32-bit version of VisualWorks answered a 16k value range and that caused huge performance problems for large identity hashed collections. A 4k range must have been a nightmare. Even 32-bit versions of Gs do well with extremely large collections.

Paul Baumann


On Oct 26, 2016 10:51 AM, "Mariano Martinez Peck via Glass" <[hidden email]> wrote:
I am sure the answer is dumb, but I still wonder. At a first glance I thought it could be because you may reuse OOPs. But...if you re-use a OOP is because that object is not anywhere in the system (and hence, there is no need of re-hash any collection). 

Another thought I had is none-persisting object and the costs of getting available OOPs vs a simply #identityHash implementation. 

BTW how long is the #identityHash? In Pharo (before Spur) we had 12 bits (or similar) in the object header so that yield 4k different values. That's not fun when you have large identity-based hashed collection because of collisions.  Anyway, how big is the identity hash in Gemstone? Or how "unique" it is?

Anyway, I was simply curious about this topic and would like you know your thoughts.


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass


_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Why #identityHash does not rely on OOP?

GLASS mailing list
In reply to this post by GLASS mailing list


On Wed, Oct 26, 2016 at 1:19 PM, Martin McClure <[hidden email]> wrote:
On 10/26/2016 07:51 AM, Mariano Martinez Peck via Glass wrote:
I am sure the answer is dumb, but I still wonder. At a first glance I
thought it could be because you may reuse OOPs. But...if you re-use a
OOP is because that object is not anywhere in the system (and hence,
there is no need of re-hash any collection).

Another thought I had is none-persisting object and the costs of getting
available OOPs vs a simply #identityHash implementation.

BTW how long is the #identityHash? In Pharo (before Spur) we had 12 bits
(or similar) in the object header so that yield 4k different values.
That's not fun when you have large identity-based hashed collection
because of collisions.  Anyway, how big is the identity hash in
Gemstone? Or how "unique" it is?

Anyway, I was simply curious about this topic and would like you know
your thoughts.


Hi Mariano,

In GemStone, identity hash is based on the oop.

I knew it!!! I knew it!
 
Try looking at an object's oop and its identity hash in hex.

hahahah nail it!  OK, now I see. 
 
There is an identity hash for each possible persistent object, so hash collisions between persistent objects cannot happen. It is possible to have, for instance, a SmallInteger and a regular object with the same identity hash, since a SmallInteger's identity hash is itself.


Got it. 
 
For all types of hashed collections you want to avoid collisions. GemStone is better at this than most other Smalltalks since there is a unique hash per object (excepting immediate objects like SmallIntegers).


Nice!! I was sure you were getting the benefits of the OOP somehow.
 

However, for some kinds of hashed collections (especially open addressing with linear probing) you also need to avoid "clumping" of hash values. What you need are hash values that look random even when the actual keys are not. For SmallIntegers, most Smalltalks make the same mistake -- the hash values of 1 2 3 4 5 are 1 2 3 4 5, when you'd want those hashes to be spread all over the possible range. And oops in GemStone have patterns, too -- they're far from random. So if you're implementing a hashed collection, it's a good idea to put the hashes of the objects through a scrambling function before using them. See #hashMultiply in Pharo for an example scrambling function.

Thanks! 


--

_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass
Reply | Threaded
Open this post in threaded view
|

Re: Why #identityHash does not rely on OOP?

GLASS mailing list
In reply to this post by GLASS mailing list
On 10/26/2016 09:19 AM, Martin McClure via Glass wrote:
> hash collisions between persistent objects cannot happen.

I should also mention that this only applies to the results of the
identityHash *method*, which is only the first half of the hash
*function*. The second half of the hash function is up to the
collection. It's often the result of the hash method modulo the table
size. Once you've done that, you *can* (and almost certainly will) get
collisions, and how bad the collisions are depends on the actual data
you're putting in the collection (it's always possible to find a data
set that performs terribly) and the choice of table size. See Pharo's
HashTableSizes class for table sizes that work well for typical data sets.

Regards,

-Martin
_______________________________________________
Glass mailing list
[hidden email]
http://lists.gemtalksystems.com/mailman/listinfo/glass