Return to the project index
Project 4: Guidelines
Like the previous projects, create a directory named "FirstAndrewID-SecondAndrewID" at:
If you need to submit updated versions, create new directories, naming them
"FirstAndrewID-SecondAndrewID.2" "FirstAndrewID-SecondAndrewID.2" etc.
We will look at only the newest version.
Important Things to Note
Provide a Makefile or something equivalent (e.g., for ant).
We will need to be able to compile your submitted code on Andrew machines.
In particular, we will not be able to use Eclipse for importing and compiling your project.
All your source code should be placed in a directory named "src,"
under the root of your submission directory.
Avoid any hard-coded parameters, such as host names, path names, and port numbers.
Submit your report in PDF, named "report.pdf," under the root of your submission directory.
Make sure to explain the overall design of your system, not just
descriptions of classes you implemented. E.g., what are the components
of your system and what are their roles? How do they interact?
Test your implementation on Andrew machines before submission.
Provide step-by-step instructions for running and testing
your code, so that people outside your team are able to do so easily.
Please do not submit a revision history with your code.
E.g., no .git in your source directory.
Make your code readable by others, with proper indentation and comments
placed as appropriate.
Please note that submissions not following these guidelines can significantly affect the grading process.
Make sure you name your submission directory in the way specified above, so we can identify and handle
each subission properly. If you want to submit a compressed archive, place it in the correctly named
One critical part of the report is an analysis on the performance scalability of your parallel implementation,
compared against your sequential version as a baseline. Your report needs to demonstrate this aspect, for example
by graphs showing completion time for varied levels of parallelism. It should also include your explanations
of the performance observed.
For DNA strands, you could compute centroids in different ways. One option is to use the most common
base for each position in the strand among all the strands of the corresponding cluster. Another option
is to derive a probability distribution over the bases, again for each position in the strand. In either
case, you would need to define how to calculate the distance between a centroid and a given strand.
Please clearly describe your definitions in your report.
Random Data Set Generator
You can implement the data set generators in any language of your choice. Please make sure
they can be executed in the environment of ghc Andrew machines.
For 2D points, you can use the provided data set generator, or choose to implemnet your own version.
For DNA strands, you need to implement one on your own.