Hash

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Hash

Scott Deans
Hi

As a relative newcomer to Dolphin could someone shed some light on what all
this 'hash' business is all about.
Sorry if it's obvious to some, but you got to learn sometime!

Cheers


Scott


Reply | Threaded
Open this post in threaded view
|

Re: Hash

Ian Bartholomew-4
Scott,

> As a relative newcomer to Dolphin could someone shed some light on what
> this 'hash' business is all about.

It can be a big subject so this is just an overview.

Hash is not just a Dolphin thing but applies to all programming languages -
but I'll use Object terms here. The hash of an object is a integral number
that can be used as a reference to the original object. The hash number for
a particular object is obtained by applying the hash function for the class
which that object belongs to, in Dolphin this is done in the various #hash
methods dotted about the image. Hash methods are designed to be efficient
but still give different hash values for each object with different state
_if possible_. e.g. (see Point>>hash for how the following is worked out)

(2 @ 3) hash = 11
(3 @ 2) hash = 14

Two important things to note about hash values.

1) if a equals b then hash a must equal hash b
2) if hash a equals hash b then a may equal b (or it may not)

So - two equal object must have the same hash value but that hash value may
be shared with other objects. This means that as

(10 @ 35) hash = 11

the following situation occurs

(2 @ 3) = (2 @ 3) -> true
(2 @ 3) hash = (2 @ 3) hash -> true

(2 @ 3) = (10 @ 35) -> false
(2 @ 3) hash = (10 @ 35) hash -> true

What is the hash used for?. Lots of things but mainly where one object must
be selected from amongst a number of similar objects.

Say you have a collection of 10000 person objects and you need to find the
person object representing "John Smith". The long way would be

person := people detect: [:each | each name = 'John Smith']

which would take an average of 5000 String comparisons.

Using hashing you would initially create a larger table with, say, 20000,
empty slots. You would then populate this table so (and please note this is
just an example, you need more code to do it properly e.g. cope for index
wrap-round) -

people do: [:each | | index |
    index := each name hash.
    [peopleTable at: index] isNil whileFalse: [index := index + 1].
    peopleTable at: index put: each].

So the idea is that for each person you calculate the hash value of their
name (index). You then look into the new table at that index to see if it is
already occupied (i.e. a previous persons name hashed to the same value -
known as a collision). If so you just keep adding one to the index until you
find an empty slot. When you have found an empty slot you insert your person
at that point (no sniggering at the back there please!!)

When you come to find a person in the table the code is now

index := 'John Smith' hash.
[(peopleTable at: index) name = 'John Smith'] whileFalse: [
    index := index + 1].
person := peopleTable at: index.

As the number of people who you will have to perform the full name
comparison on is _much_ smaller the overall lookup operation will be much
quicker.

The magic is in creating good hash functions and choosing the best size of
table to store your hashed values in. If our peopleTable above had also has
10000 slots then we would gain nothing over a sequential comparison, if it
had 100000 then out lookup would be fast (few collisions) but would waste a
lot of memory.  There are other (better) techniques for dealing with
collisions, chaining all colliding objects into a linked list is a common
one as is a rehash using a different hash function.

Dolphin's Set class using hashing to store it's objects so a browse through
it's methods and a read of some of the comments might help (especially
Set>>findElementOrNil).

Final point. The main situation where you will get involved with hashing is
if you want to override the = method for a class, to provide a better
equality test. Say

Person>>= anotherPerson
    ^(self name = anotherPerson name)
        and: [self idNumber = anotherPerson idNumber])

What you must always do is override hash to ensure that whenever the above
answers true then the hash value for the two objects will also answer true.
It's not as bad as it sounds though.

Person>>hash
    ^self name hash + self idNumber hash

Assuming the equality/hash relationship is correct for Strings (name) and
Integer (idNumber) then the equality/hash relationship will also hold for
Person.

As I said above, there's much more too it but there's also a lot of
literature that explains it much better than I have.

Ian


Reply | Threaded
Open this post in threaded view
|

Re: Hash

Scott Deans
In reply to this post by Scott Deans
Thanks Ian.

It looks like a major topic :)


Scott