Lab 7 - Workloads and Sort Performance
Due: Monday, February 23, 2009th at 11:59PM

Overview

Not all sorts are born equal -- and they are certainly not born the same. The purpose of this lab is to help you to get a hands-on feel for the performance of various sort algorithms. In this lab, we hope that you get a feel for how the size of the list as well as its initial organization affect the performance of each algorithm.

You'll do this by designing workloads for each sort, collecting data, and producing some graphs summarizing these results.

The source code we've given you includes a full implementation of BubbleSort, SelectionSort, InsertionSort, and QuickSort. These sorts are designed as derived types that extend the Sort class. The Sort class includes all of the machinery to time the sorts and a main() method to collect and display the results.

The first thing we'd like you to think about is how the sorts work and consequently how they handle different types of inputs. Some of the sorts that you will use in this lab are stable. Stable sorts perform equally well, no matter how the input data is organized. But, some of the sorts that you will use in this lab are not stable. Their performance will vary drastically, depending on the order of the input data.

Typically, we are sorting a completely unordered collection of data. We'll call this case the average case. But, sometimes we are sorting things that have a particular order already -- perhaps some strange order, perhaps sorted backwards, perhaps already sorted the way we want.

What we'd like you to do is to construct an average case workload for each sort by initializing the array of numbers to random numbers. Then, if possible, we want you to order it in a way that will make the sort run as fast as possible. And, lastly, we want you to organize the input in a way that will make the sort run as slowly as possible.

It is important to realize that not all sorts will have a different best-, average-, or worst-case - it is possible that all orderings will perform roughly the same. That's okay. Remember, stable sorts are not sensitive to the order of the input data. But for others, you should see a clear difference.

Another important detail is that your best- and worst-case workloads should reflect the amount of time taken. You will also be gathering data on the number of swaps used, and this won't necessarily be minimized or maximized in the same workloads. In fact a very slow run may use few swaps, you should notice this result in your data and understand why it happens.

To create a particular workload, all you have to do is put the numbers into the input array, called numbers, in the order that you want. To create the average case array, just put a random number into each element of the array. The comments in our code will help you to understand how things work.

Results and Graphs

An addition to your source code, we'd like you to submit graphs and data tables illustrating the relationship between the number of elements and time to sort them and the number of swaps needed. Since Insertion sort shifts elements rather than swapping, you only need to make a time graph for this sort. This leaves a total of 7 graphs and tables (4 time vs input size and 3 swaps vs input size), each with best- and worst-case lines. Although this sounds like a lot, you can make these graphs very quickly with MS Excel or other software package.

Your graphs will be very informative, showing pronounced differences and similarities among many cases. Perhaps the least intuitive and most remarkable differences are for quicksort, be sure you understand why the sort behaves as it does! If you don't understand the results, or what the best and worst cases are, please ask for help -- that's why we're here.

Suggested plan of attack

1. Write a method to randomize the array and call this from each sort's init method. Use this to verify that the sorts are working properly and gather average case data. This will serve as a benchmark for you use to compare other workloads against.
2. Write the best- and worst-case for the quadratic sorts.
3. Write the best-case for quicksort.
4. Write the worst-case for quicksort.
5. Collect your data and make sure you understand the results. This may take a while, there are lots of sorts and the worst case on 24,000 elements will run very slowly...
6. Whip up some graphs, submit them along with your source, and celebrate!

FAQ

• How do create the `numbers` array?
`int numbers[]` has been declared for you in the `Sort` class. However it has not been initialized, you should do that as necessary like this: `numbers = new int[SIZE];` (where SIZE is the size of the array).
• What is the difference between `++value` and `value++`?
The former returns the new value, while the latter returns the old value. So if we had `int x = 0;` and then tried `if(x++ == 0)` would test true because `x` was 0 before the increment. On the other hand `if(++x == 0)` we would test false because `x` now has value 1.
• Do you want us to the `Random` class?
It's up to you but there are some handy methods there :)
• Do we have to actually create three different array?
The `init` method needs to allocated a fill the numbers array. Depending on the type of workload, you will fill the array in one of three ways. So yes there will be three different arrays, but only one is used at a time.