Hi,
I was watching to the Collection>>#hash implementation and I found something strange: self size <= 10 ifTrue: [self do: [:elem | hash := hash bitXor: elem hash]]. Can someone know why the contents are only considered for the hash if the collection size is below 10? Is that a bug or a feature? Moreover, it is in contradiction with the end of the comment: -- two equal objects have equal hash values TIA, Cheers, Vincent winmail.dat (19K) Download Attachment |
Hi, it does not look like a good hash (in term of probability of collisions) and would deserve a revision. But what problem did you detect wrt equality? The implication is this way: self assert: (a = b) ==> (a hash = b hash) Since the element hash are bitXor'ed, the resulting hash will be independent of ordering of elements. It will thus work for unordered collections (like Set in which ordering depends on container size). (and badly for others if they omit to refine hash). Or is it because Collection of different species could be equal? Nicolas 2018-06-20 23:15 GMT+02:00 <[hidden email]>: Hi, |
From: Pharo-dev [mailto:[hidden email]]
On Behalf Of Nicolas Cellier Hi, Hi, it does not look like a good hash (in term of probability of collisions) and would deserve a revision. But what problem did you detect wrt equality? None, I was just wondering why there is a limit defined at 10 and if a problem can exist due to this. The implication is this way: self assert: (a = b) ==> (a hash = b hash) Since the element hash are bitXor'ed, the resulting hash will be independent of ordering of elements. It will thus work for unordered collections (like Set in which ordering depends on container size). (and badly for others if they omit to refine hash). Thx for the explanation! Or is it because Collection of different species could be equal? I was watching to the dictionaries #= and #hash. And indeed, it is not a problem in it. But the hash used is really not efficient for unordered collections of unordered collections
of same size. Vincent Nicolas 2018-06-20 23:15 GMT+02:00 <[hidden email]>:
|
2018-06-21 0:00 GMT+02:00 <[hidden email]>:
Indeed, but remember that we even not have a guaranty that Collection are of finite size... I would expect that they are countable (by virtue of do: we can iterate over its elements, possibly with an infinite loop). So it would make sense to refine hash in Set/Dictionary/HashedCollection and remove the size limitation there... Maybe something like ^self inject: self size hash into: [:hash :elem | hash bitXor: elem hash].
|
2018-06-21 0:16 GMT+02:00 Nicolas Cellier <[hidden email]>:
Oh, and I forgot, you could offer you the book of Andres Valloud: it will give a pretty deep insight in hashing algorithm efficiency (and measurement of).
|
Free forum by Nabble | Edit this page |