CSC 241- Set #11 (Starting December 3, 1997)


Hash Table

Hash table is a great data structure for implementing keyedCollections. There are a number variations in implementing hash tables. What we will discuss here will involve collision resolution by chaining which is only one of the many techniques for addressing collisions.

The basic concept here is that you have keyed objects that need to be maintained in a structure. The key of such a keyed object is mapped to an array cell based on a hash function that is designed to randomly distribute these objects in the array. Collision resolution with chaining is a technique for dealing with multiple keys mapping to the same array cell; this technique allows us to create a linked list of nodes with keyed objects. The only way to avoid needing a collision resolution technique is by having enough cells where each key maps to a unique cell in the array; which is typically not practical. Let us look at an example to understand hwo it all works.

Lets assume that our array has seven cells in it, thus the hash function needs to generate a number from 0 to 6. Assuming a simple int key, we can use modulus arithmetic to map the key to an int number from 0 to 6. key % 7 yields a number from 0 to 6. A few examples: if key is 10, the function will yield a 3. If key is 45, the function yields a 3. If key is 19, the function yields a 5. If key is 63, the function yields a 0.

lets put these keys in our table assuming they are inserted in the sequence shown above:


hash table
  ---     ----
0| --|-->|63--|-->
 |---|    ----
1| --|-->
 |---|
2| --|-->
 |---|    ----     ----
3| --|-->|45--|-->|10--|-->
 |---|    ----     ----
4| --|-->
 |---|    ----    
5| --|-->|19--|-->
 |---|    ----
6| --|-->
 |---|
7| --|-->
  ---

Hash tables are great for maintaining keyed data but notice that the information here is really not ordered in a way that is useful directly. In other words, we can't just generate a sorted list of keys by visiting all nodes in the table. This is an important consideration when deciding on implementations. For example, if the application needs the ability to readyly generate an ordered list of employees, the BST may be a better choice. But, with a reasonable number of cells put aside and a hash function that distributes values well, hash table compares well with BST when performing the operations required in keyedCollection. It is only when you need to extend one or the other to have additional functionality that the advantage of BST comes into play.

Hash Table Implementation

We first need a node class to implement the linked list that we need for representing chains:


class hashNode {
   Keyed element;
   hashNode next;
   hashNode (Keyed k, hashNode n) {
     next=n;
     element=k;
   }
}

as you can see, it is simple linked list node. with a keyed element and a reference to the next node.

In looking at the hashTable class, note that there are two state variables. The value for size is determined at construction and gives the number of cells to put aside for the table. t[] is the table; it is simply an array of hashNodes.

The constructor sets size and assigns null to all cells of the array, indicating that initially there are no values in the table.

hash is a private method, not accessible outside the class. Math.abs(x.hashCode()) % size; is what this function returns; it really is the same as the function I shared with you earlier in our example. x.hashCode() returns a whole number that is unique for the string held in x; however, this number may be negative which is why we use Math.abs() to make sure we have its absolute value. % provides the remainder of performing a division of Math.abs(x.hashCode()) by size.

add method

Very simple process! We again use find to make sure the key is not already in the table as we did with our tree implementation. if we know that the value doesn't exist, we simply create a new node with the new keyed object and place it in front of the chain that we hash into. The instruction: t[hash(x.key())]=new hashNode(x,t[hash(x.key())]); accomplishes this. hash(x.key()) provides the index of the cell that we hash into, the rest should make sense as we are inserting in front of the chain.

As part of lab5 you have been asked to remove the call to find from the add method in Tree. The intent there is that you have to visit the same path in the tree twice for first performing a find and then the actual add for the new element. You should notice that I don't ask that you do that for the hash table implementation of keyed collection. The reason is simple, whether you use find or let add do it, you have to map into the appropriate cell in the table and search for the key to make sure it doesn't already exist in the table. Performance wise, there is no benefit in having add do this search as once we know the value doesn't exist already, we just add in front of the chain and there is no need to traverse the chain twice.

remove method

The remove method is designed to hash into the appropriate cell, locate the node that needs to be removed and change references to accomplish this.

First it makes sure that the chain actually exists, if it does not the object can't exist in the table and the exception is thrown. We then check to see if the object is front of the chain which is a special case and must be handled separately from the cases where the object is in any other node of the chain. When it is the first node that is being removed, the content of the cell in the table that references it changes. t[hash(key)]=t[hash(key)].next; replaces the node in the cell from the hash table with the next node on its chain.

Once we establish that we need to traverse the chain, we establish a prev node reference. prev will always point to the node before the "current" node being examined. Once the node with the object being deleted is located, next in prev is replaced by the next in the current node, effectively removing the current node from the chain. The for loop:
for(hashNode temp=prev.next;temp != null;prev=temp,temp=temp.next) is designed to start temp which will reference our current node to start from the second node in the chain. The loop stops when temp is null. After each iteration, temp and prev are both moved forward.

Inside the loop, we check to see if the element that we want to remove is referenced by temp, if it is, prev.next=temp.next; updates the next reference in prev and we simply return from the method (i.e. we don't let the loop check all nodes and terminate when temp is null).

If the loop terminates without us returning as a result of finding the intended object, we throw and exception.