Now its time to consider performance issues in a more analytical way. Thinking about how we can improve the performance of systems has always been interesting to me. Analysis of algorithms is the formal study of what is at the core of systems performance, the algorithms that perform the tasks that we want. Algorithm complexity is typically discussed in computer science using the Asymptotic time complexity analysis (Big-O notation). Here we are interested in the Order of magnitude for the complexity of an algorithm. For example, if you have a set of values in an array, how many cells do you have to visit in order to find the element that you are interested in? As it turns our choice of data structures is also very critical when considering performance. Algorithms are often driven from the way we decide to maintain our data.
When we discussed circular queues and used a circular queue concept in implementing our fixed queue, I discussed how they saved us from having to move all elements back when we dequeued. Think about the doublely linked list that we used for representing the dynamic version of our queue. The use of doublely linked list along with the reference variables keeping track of front and back of the list enabled us to perform enqueue or dequeue with just a couple of instructions and no iterations. If we had a singlely linked list and only one reference to the front of the list, we would have had to find the end of the queue each time enqueue occurred. This would have us traversing the list from the beginning until the last node and then adding a new node after that.
Now, where does the order of magnitude come into play? In both cases of the dequeue in our fixed version queue and the enqueue in our dynamic version we are looking at algorithms that are considered O(1). When the complexity is represented by O(1), we are saying it doesn't mater how many elements we have in our queue, there are no loops and performing a set of instructions once completes the task.
The alternative approaches discussed above require traversing the whole list. These algorithms are considered O(n) algorithms. When the complexity is O(n), the more elements we have, the longer it will take to perform the task. We iterate through a set of instruction multiple times depending on the number of elements in our queue.
Understanding why the above algorithms are O(1) or O(n) is not too hard. what about cases where we need to visit a list of elements twice, or where we need to visit one half of a list. How about cases where we need to do a few preliminary operations and then iterate through a list? Simple, Constant multipliers or additions don't get counted when determining complexity. We are interested in magnitude.
A search algorithm that needs to look for something sequentially in a list with n elements has a complexity of O(n). If we are dealing with an ordered list, on average, it will need to check half of the elements in the list before it decides if the element is in the list or not. However, its performance is still considered to be O(n).
Suppose we need to read a whole list of elements in, sort the list, and then write the newly ordered elements back to the same file. If using a sort algorithm like the selection sort, as we will discuss later, we will have an O(n2) behavior. So, O(n) for reading the data in, O(n2) for sorting the data and O(n) for writing the data back. What is the order of magnitude in cases such as this where there are several subtasks involved? It is O(n2), the subtask with highest order of magnitude determines the complexity of the the whole. In this case, it makes NO sense to say something like this program's complexity is O(n2)+2O(n).
We need to discuss the space allocation vs. timing requirements of algorithms, when we analyze algorithms, we need to be concerned about both.
Suppose we have a list of employee records that we wish to keep in memory. Let us assume that the records are kept in the order of the SS# and we wish to locate them based it.
We can obviously keep the employee records in an array of a reasonable size based on the number of employees we have. One way to search for an specific employee is to search sequentially. As mentioned before since the list is kept in order, on average, we need to visit half of the list to locate the employee (O(n) complexity). If we have 1000 employees, we need to visit 500 cells on the average.
An alternative and superior approach is to utilize a binary search technique. Binary search performance is O(log2n), more about that later. So, for 1000 employees, we can find a particular SS# with 10 iterations.
There is yet another alternative that is even faster. Imagine, instead of putting aside say 1000 spaces in an array for our employees, we put aside 1,000,000,000 spaces. Now, each SS# is in its own little cell, so, if the SS# that we want is 546324532, we can go directly to that cell and process the employee. What is our performance? Well, as far as, timing is concerned it is O(1) which is great. But, we had to allocate 1000 times more space than we needed, our space utilization is O(n2)! Even if we only needed 1000 bytes of information kept for each employee, we would need to put aside 1000 gigabyte of memory for the employee array! Have you ever worked on a computer with 1000 gigabyte of RAM, I have never seen one!
The point here is simple, speed is not the only issue at hand when considering complexity. How much space is needed does affect your choice of algorithms.
As mentioned before Binary Search assumes an ordered list. The algorithm considers the middle of the list first. If the data being searched for is in the middle, obviously, you are done searching. If not in the middle, the search value is compared to see if it is smaller or greater than what is in the middle. Suppose it is less, the whole right half won't need to be considered any more. You treat the left half the same way as you treated the whole list, go to its middle and check the same way, continue considering smaller and smaller sections of the list until we find the element or decide that it is not in the list.
Let us now look carefully at why binary search is O(log2n).
Here is the algorithm for Binary search, note that the method returns true
if the search value is found, otherwise, it returns false. We assumes that
an ordered list l along with a variable that keeps track of the
number of cells occupied size are available to us in this method.
boolean find (int searchVal) {
int first = 0;
int last = size - 1;
while (first <= last) {
int middle = (last + first) / 2;
if (l[middle] == searchVal)
return true;
if (l[middle] < searchVal)
first = middle+1;
else
last = middle-1;
}
return false;
}
Consider the following list:
size=10
------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
------------------------------------------------
0 1 2 3 4 5 6 7 8 9
Lets trace the algorithm for searching for 60:
first=0 last=9
------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
------------------------------------------------
0 1 2 3 4 5 6 7 8 9
middle=4 ^
first=5 last=9
------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
------------------------------------------------
0 1 2 3 4 5 6 7 8 9
middle=7 ^ return true;
Lets trace the algorithm for searching for 18:
first=0 last=9
------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
------------------------------------------------
0 1 2 3 4 5 6 7 8 9
middle=4 ^
first=0 last=3
------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
------------------------------------------------
0 1 2 3 4 5 6 7 8 9
^ middle=1
first=2 last=3
------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
------------------------------------------------
0 1 2 3 4 5 6 7 8 9
middle=2 ^
last=1 first=2
------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
------------------------------------------------
0 1 2 3 4 5 6 7 8 9
return false;
In the first example, only 2 cells had to be examined and in the second case only 3 cells had to be examined. Binary search locates an element with log2n or fewer cells visited, 1000 elements would require about 10 cells examined, try it out. Each time we iterate, we dismiss half of the list or the sublist being considered, this is a logarithmic behavior that we see time and again in algorithm analysis and design.
Let us consider Selection Sort again (we looked at it a few weeks ago as an example for manipulating intList objects). The basic approach in this sort technique is to find the cell with the smallest value and interchange the content of that cell with what is in the first cell. Then, you locate the second smallest and interchange it with what is the second cell. This process continues until all elements are placed where they belong.
Here is a code segment that implements selection sort. data is our
array of elements and n tells us how many elements there are in
the list.
for (int i=0; i<n-1; i++) {
int min = i;
for (int j=i+1; j<n; j++)
if (data[j] < data[min]) {
min = j;
}
tmp = data[i];
data[i] = data[min];
data[min] = tmp;
}
}
Consider this list and follow the trace shown when the sort algorithm is
applied:
------------------- ------------------- -------------------
|10 | 12 | 9 | 11 |==>|10 | 12 | 9 | 11 |==>|10 | 12 | 9 | 11 |==>
------------------- ------------------- -------------------
0 1 2 3 0 1 2 3 0 1 2 3
min=i=0^ j=1^ min=i=0^ j=2^ i=0^ min=j=2^
------------------- ------------------- -------------------
|10 | 12 | 9 | 11 |==>| 9 | 12 |10 | 11 |==>| 9 | 12 |10 | 11 |==>
------------------- ------------------- -------------------
0 1 2 3 0 1 2 3 0 1 2 3
i=0^ min=2^ j=3^ i=0^ min=i=1^ j=2^
------------------- ------------------- -------------------
| 9 | 12 |10 | 11 |==>| 9 | 12 |10 | 11 |==>| 9 | 10 |12 | 11 |==>
------------------- ------------------- -------------------
0 1 2 3 0 1 2 3 0 1 2 3
i=1^ min=j=2^ i=1^ min=2^ j=3^ i=1^
------------------- ------------------- -------------------
| 9 | 10 |12 | 11 |==>| 9 | 10 | 12 | 11 |==>| 9 | 10 | 11 | 12 |
------------------- ------------------- -------------------
0 1 2 3 0 1 2 3 0 1 2 3
min=i=2^ j=3^ i=2^min=j=3^ D O N E
How many cells were visited? all 4 to find the smallest, the 3 remaining for next number, 2 cells for the 3rd number. 9 visits were required.
In general, we need to think:
n + (n-1) + (n-2) + ... + 2 =
n (n+1) / 2 - 1 =
1/2n2 - 1/2n - 1
This is considered to be O(n2), think magnitude!