CSC 241- Lab 5 (Due May 9, 2003) [60 Points]


Theme ...

You will learn ...

In this lab, you will work with the Set Package. You start by copying the Set interface, its two implementations Tree and hashTable, along with their supporting node classes. You will also copy the test programs I have designed for these classes.

Set objects act as container for unique entities. All elements stored in a Set must implement the Comarable Interface. String is an example of a class that implements Comparable. Comparables are compared to other like objects with their CompareTo method. This characteristic is important when dealing with sets, we need to be able to distinguish and compare our objects that we store in them. With the ability to identify elements uniquely and comparing them to see is they are the same or one is less than the other, we are enabled to store them in a way where we can find them quickly.


Create directories and copy some files

  1. cd public-html/classes/csc241
  2. cp -r ~mohammad/public-html/classes/csc241/Set/ . -- don't forget the dot or the -r.
  3. From public-html/classes/csc241 directory, type ls -R Set. The list should look like:
    Set:
    Set.java                 hashNode.java   treeNode.java
    Tree.java                hashTable.java  testSet
    
    Set/testSet:
    elementForTest.java  test1.java
    test2.java
    

Compile all Java files and run the test program

in the testSet directory, test1 and test2 are obviously test programs. test1 randomly generates String objects and stores them in a Set object and then calls the dump() method in the Set object. test2 randomly generates elementForTest objects and stores them in a Set object. It also, randomly invokes other operations, like find, and remove to test them as well. elementForTest objects are Comparable, they have a key_ field that uniquely identifies them.

  1. Use javac *.java in each directory to compile all of the above .java files.
  2. Run the program that tests the two implementation of Set.

Compare Tree against hash table

First lets perform a few experiments to compare performances for hash tables and binary search trees. We need a bigger pool of data than what test1 and test2 use to test these data structures with; just inserting 100 or so elements and performing 100 or so update operations on these structures will not give you a definitive conclusion on their performance.

The test2 program is already designed to calculate the elapsed time (i.e. how long it took to perform the operations); modify it to test for inserting 10,000 elements and 1,000 randomly chosen update operations after the insertion are complete. Two thing that you should not over look:


Run the test program with tree as its argument to test the tree version and collect the elapsed time for 5 runs and record their average.

Run the test program with hash as its argument and the size 10 again collecting the elapsed times for 5 runs and recording their average.

What happens if you use a size argument that is larger than just 10 when testing the hash version? Is there a point where hashtable performs better than BST. When testing for higher hash table sizes, be sure to average the results of 5 runs in each case before recording them.

In response to the questions in this part, in your email, include:

  1. what the average elapsed time for the BST was.
  2. a table showing what sizes you tested the hash table for and
  3. what does your analysis yield when comparing the performance of these data structures.
Tune the implementations

In this section, you will modify some of the code for both data structures and again through experimentation respond to a performance question.

The way things work now, the actual process of inserting or deleting elements don't have to bother with the possibility of either the element not being there, when it should be, or it being there, when it shouldn't be. The idea here is simple, by using find, you traverse the data structure looking for the element; in remove, if it is not there, we do nothing; in add, we do the opposite, if it is already there, we do nothing. By using find in add and remove, del or insert don't have to deal with an object not existing when it should, or exiting when it shouldn't be.

Remove the calls to the find method. The insert and del methods in the Tree version need to be modified to deal with the described anomalies. In insert you must consider that the element may already be in the tree and if it is, you must simply return root. In del you must consider that the element might not be in the tree and if it is not, just return null; you only determine an object is not in the tree when you root is null.

hashTable needs to be updated in a similar way. the difference is that add and remove don't call any recursive methods, thus, they themselves have to be changed to find the elements.

test changes that you make with test2 (i.e. with output), to make sure these methods still work correctly.

Once you know they work correctly, you need to repeat the analysis that you performed in the previous section in order to respond to the following question:

Can the performance of either Tree or hashTable be improved by making the changes specified here.

Submit the code for both Tree and hashTable with the modifications.