I am sure the answer is dumb, but I still wonder. At a first glance I thought it could be because you may reuse OOPs. But...if you re-use a OOP is because that object is not anywhere in the system (and hence, there is no need of re-hash any collection).
Another thought I had is none-persisting object and the costs of getting available OOPs vs a simply #identityHash implementation. BTW how long is the #identityHash? In Pharo (before Spur) we had 12 bits (or similar) in the object header so that yield 4k different values. That's not fun when you have large identity-based hashed collection because of collisions. Anyway, how big is the identity hash in Gemstone? Or how "unique" it is? Anyway, I was simply curious about this topic and would like you know your thoughts. _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
On 10/26/2016 07:51 AM, Mariano Martinez Peck via Glass wrote:
> I am sure the answer is dumb, but I still wonder. At a first glance I > thought it could be because you may reuse OOPs. But...if you re-use a > OOP is because that object is not anywhere in the system (and hence, > there is no need of re-hash any collection). > > Another thought I had is none-persisting object and the costs of getting > available OOPs vs a simply #identityHash implementation. > > BTW how long is the #identityHash? In Pharo (before Spur) we had 12 bits > (or similar) in the object header so that yield 4k different values. > That's not fun when you have large identity-based hashed collection > because of collisions. Anyway, how big is the identity hash in > Gemstone? Or how "unique" it is? > > Anyway, I was simply curious about this topic and would like you know > your thoughts. > Hi Mariano, In GemStone, identity hash is based on the oop. Try looking at an object's oop and its identity hash in hex. There is an identity hash for each possible persistent object, so hash collisions between persistent objects cannot happen. It is possible to have, for instance, a SmallInteger and a regular object with the same identity hash, since a SmallInteger's identity hash is itself. For all types of hashed collections you want to avoid collisions. GemStone is better at this than most other Smalltalks since there is a unique hash per object (excepting immediate objects like SmallIntegers). However, for some kinds of hashed collections (especially open addressing with linear probing) you also need to avoid "clumping" of hash values. What you need are hash values that look random even when the actual keys are not. For SmallIntegers, most Smalltalks make the same mistake -- the hash values of 1 2 3 4 5 are 1 2 3 4 5, when you'd want those hashes to be spread all over the possible range. And oops in GemStone have patterns, too -- they're far from random. So if you're implementing a hashed collection, it's a good idea to put the hashes of the objects through a scrambling function before using them. See #hashMultiply in Pharo for an example scrambling function. Regards, -Martin _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
Gs doesn't have the problems common in 32-bit versions of other dialects related to collisions from short #identityHash range. I recall the 32-bit version of VisualWorks answered a 16k value range and that caused huge performance problems for large identity hashed collections. A 4k range must have been a nightmare. Even 32-bit versions of Gs do well with extremely large collections. Paul Baumann On Oct 26, 2016 10:51 AM, "Mariano Martinez Peck via Glass" <[hidden email]> wrote:
_______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
On Wed, Oct 26, 2016 at 1:19 PM, Martin McClure <[hidden email]> wrote: On 10/26/2016 07:51 AM, Mariano Martinez Peck via Glass wrote: I knew it!!! I knew it! Try looking at an object's oop and its identity hash in hex. hahahah nail it! OK, now I see. There is an identity hash for each possible persistent object, so hash collisions between persistent objects cannot happen. It is possible to have, for instance, a SmallInteger and a regular object with the same identity hash, since a SmallInteger's identity hash is itself. Got it. For all types of hashed collections you want to avoid collisions. GemStone is better at this than most other Smalltalks since there is a unique hash per object (excepting immediate objects like SmallIntegers). Nice!! I was sure you were getting the benefits of the OOP somehow.
Thanks! _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
In reply to this post by GLASS mailing list
On 10/26/2016 09:19 AM, Martin McClure via Glass wrote:
> hash collisions between persistent objects cannot happen. I should also mention that this only applies to the results of the identityHash *method*, which is only the first half of the hash *function*. The second half of the hash function is up to the collection. It's often the result of the hash method modulo the table size. Once you've done that, you *can* (and almost certainly will) get collisions, and how bad the collisions are depends on the actual data you're putting in the collection (it's always possible to find a data set that performs terribly) and the choice of table size. See Pharo's HashTableSizes class for table sizes that work well for typical data sets. Regards, -Martin _______________________________________________ Glass mailing list [hidden email] http://lists.gemtalksystems.com/mailman/listinfo/glass |
Free forum by Nabble | Edit this page |