Lab 7 - Hash Table Spell Checking
Due: Monday, April 7th at 11:59PM

Overview

One straight-forward way to implement a fast spell checker is to use a hash table. If a word is found, it is assumed to be spelled correctly. Otherwise, it is assumed to be spelled incorrectly.

The performance of the spell checker can be further improved by maintaining a separate hash table that maps common misspellings to their correct spelling. This can enable the spell checker to suggest a correct spelling for commonly misspelled words.

The performance can be further improved by correcting common mistakes in words and searching again, for example, "ie" vs "ei" or "as" instead of "sa", &c. If the adjusted spelling is found, it can be suggested.

This assignment asks you to build a spell checker capable of identifying correctly spelled words, suggesting words from a hash table of commonly misspelled words, and making as many other enhancements as you'd like.

The Hash Table

You should build a generic open chain hash table, constructed as an array or Vector of linked lists. It should have an insert() method and a retrieve() method:

Objects should be inserted and found based on their hashCode().

The Spell Checker, Itself

You should design and implement a SpellChecker class, which contains the hash tables needed to perform spell checking and which provides the appropriate functionality. This class should contain at least one constructor:

Additionally, the SpellChecker must have a method as follows:

This method should output each potentially misspelled word with potentially correct spellings. The output should be readable and nicely formatted.

The Dictionary Hash Table

You have been provided with a dictionary that contains root words. These should be inserted into the dictionary hash table based on the hashCode() of their String representation.

The Misspellings Hash Table

You should create a new class of Objects to represent misspelled words. It should contain the misspelling and the correct spelling and should be able to report each. It should also override the hashCode() method to return the hashCode() of the misspelling. This will enable the retrieve() method of your hash table to find this object and the correct spelling within it, given the misspelling.

Finding Root Words

The provided dictionary only contains root words, not each of their forms. So, in order to find a word, you might need to reduce it to its root form. So, for each word, you should try at least the following:

Program Main

The SpellChecker class should also have a main method. This method should acept all of the files as command-line arguments, should instantiate a new SpellChecker, and should spellCheck() the requested document file.

Chain Length Plot

Experiment making the hash table different sizes, holding the dictionary size as a constant. When you find the interesting range of sizes, please produce a plot of the minimum, maximum, and average chain sizes as a function of the table size.

You'll probably want to write a separate main program to collect and output this data. Then, please use Excel, gnuplot, or the tool of your choice to produce a plot the interesting size range. The interesting size range is the minimum range that shows the table, in practice, being too large through the table being too small.

Useful Files