Hi
As a relative newcomer to Dolphin could someone shed some light on what all this 'hash' business is all about. Sorry if it's obvious to some, but you got to learn sometime! Cheers Scott |
Scott,
> As a relative newcomer to Dolphin could someone shed some light on what > this 'hash' business is all about. It can be a big subject so this is just an overview. Hash is not just a Dolphin thing but applies to all programming languages - but I'll use Object terms here. The hash of an object is a integral number that can be used as a reference to the original object. The hash number for a particular object is obtained by applying the hash function for the class which that object belongs to, in Dolphin this is done in the various #hash methods dotted about the image. Hash methods are designed to be efficient but still give different hash values for each object with different state _if possible_. e.g. (see Point>>hash for how the following is worked out) (2 @ 3) hash = 11 (3 @ 2) hash = 14 Two important things to note about hash values. 1) if a equals b then hash a must equal hash b 2) if hash a equals hash b then a may equal b (or it may not) So - two equal object must have the same hash value but that hash value may be shared with other objects. This means that as (10 @ 35) hash = 11 the following situation occurs (2 @ 3) = (2 @ 3) -> true (2 @ 3) hash = (2 @ 3) hash -> true (2 @ 3) = (10 @ 35) -> false (2 @ 3) hash = (10 @ 35) hash -> true What is the hash used for?. Lots of things but mainly where one object must be selected from amongst a number of similar objects. Say you have a collection of 10000 person objects and you need to find the person object representing "John Smith". The long way would be person := people detect: [:each | each name = 'John Smith'] which would take an average of 5000 String comparisons. Using hashing you would initially create a larger table with, say, 20000, empty slots. You would then populate this table so (and please note this is just an example, you need more code to do it properly e.g. cope for index wrap-round) - people do: [:each | | index | index := each name hash. [peopleTable at: index] isNil whileFalse: [index := index + 1]. peopleTable at: index put: each]. So the idea is that for each person you calculate the hash value of their name (index). You then look into the new table at that index to see if it is already occupied (i.e. a previous persons name hashed to the same value - known as a collision). If so you just keep adding one to the index until you find an empty slot. When you have found an empty slot you insert your person at that point (no sniggering at the back there please!!) When you come to find a person in the table the code is now index := 'John Smith' hash. [(peopleTable at: index) name = 'John Smith'] whileFalse: [ index := index + 1]. person := peopleTable at: index. As the number of people who you will have to perform the full name comparison on is _much_ smaller the overall lookup operation will be much quicker. The magic is in creating good hash functions and choosing the best size of table to store your hashed values in. If our peopleTable above had also has 10000 slots then we would gain nothing over a sequential comparison, if it had 100000 then out lookup would be fast (few collisions) but would waste a lot of memory. There are other (better) techniques for dealing with collisions, chaining all colliding objects into a linked list is a common one as is a rehash using a different hash function. Dolphin's Set class using hashing to store it's objects so a browse through it's methods and a read of some of the comments might help (especially Set>>findElementOrNil). Final point. The main situation where you will get involved with hashing is if you want to override the = method for a class, to provide a better equality test. Say Person>>= anotherPerson ^(self name = anotherPerson name) and: [self idNumber = anotherPerson idNumber]) What you must always do is override hash to ensure that whenever the above answers true then the hash value for the two objects will also answer true. It's not as bad as it sounds though. Person>>hash ^self name hash + self idNumber hash Assuming the equality/hash relationship is correct for Strings (name) and Integer (idNumber) then the equality/hash relationship will also hold for Person. As I said above, there's much more too it but there's also a lot of literature that explains it much better than I have. Ian |
In reply to this post by Scott Deans
Thanks Ian.
It looks like a major topic :) Scott |
Free forum by Nabble | Edit this page |