CS 213:  Assignment #5

Due:  Before class on Tuesday, May 4, 1999  **NO EXTENSIONS**

Submission Procedure:

  1. Code is to be emailed to the grader bbh4@cornell.edu.
  2. Please refer to the newsgroup for filenames specific to this assignment.
  3. Please refer to the email you received from the grader for submission details.   Email the grader with questions.

Problem #1:  Dictionary and Spell Checker  (2 hours)

For this problem you will be implementing a spell checker class (Dictionary) that uses the multimap class from the standard template library to store words for querying.   You must write a program that takes three command line arguments.  The first is the name of a source text warpeace.txt), the second in the name of a testing text (warped.txt) and the third in the name of an initially empty output file.  The program must parse the source text and insert into an object of class dictionary each unique word that it finds in the text.  It must then parse the test text and report to the output file each word in the text not in the dictionary and also make suggestions of possible replacements.

For the purposes of this assignment, a word is any string that contains only characters, apostrophes, or hyphens that starts with a letter and ends with a letter.  Hyphenated works (e.g., well-known) should be treated as a single word.  Words with apostrophes should also be treated as words with the exception of words ending in "'s" which should be be stripped of this suffix.  (e.g. "don't" is a word, but "Mike's" should be considers as an instance of "Mike".)  Also, all tests should be case insensitive.  (e.g., "Apple" is the same as "apple".)

Your class dictionary should be a template class that takes as a parameter a hash function class that contains a static method h.  The purpose of h is to produce keys from words such that similar words have similar keys.  (Part of the assignment is interpreting this statement.)  When a "misspelling" of a word occurs the dictionary class should produce a list of possible words from the dictionary that have equal or close keys.

Your dictionary class should have the following public methods:

    void Insert(const string& s);         // Insert s into the dictionary
    bool IsMember(const string& s) const;    // return true is s is in the dictionary, false otherwise
    void PutToFile(ofstream& o) const;    // print all the words in the dictionary to the file o (we will use this for testing)
    void WrittenSuggestion(const string& s, ofstream& o) const;     // given a word s, and output file o, print s: { words similar to s in dict. }

You may of course define whatever constructors, destructors and private methods you find necessary.  You must pass the file streams as parameters as indicated.

Note:  War and Peace contains over half a million words; be aware of this when you write your getword function (* a little optimization goes a long way *).  A good solution on a reasonable machine should be able to parse the text and insert words in < 5 minutes.

Note that it is not necessary for the program to produce a suggestion for each misspelled word - if there are no reasonable guesses then don't make any.  My sample text warped.txt has 12 words which do not appear in the original text.  We will test your solution with a different input.

For a small bonus, write a hash function class that hashes all anagrams of a word to the same key.  (i.e, two words have the same key if and only if they are anagrams of each other.)  You may assume that no letter appears more than 4 times in a word.   You should discount apostrophes and hyphens in constructing anagrams.  As a key you must hash some sort of integer (of any length, signed or unsigned).  You may find that you need to use 64 bit integers for this.

While I will not provide any help with the bonus, here is a list of other helpful hints.


Note that your grade on each of these problems will be based in large part on your proper usage of the facilities of the C++ language and not strictly on functionality.

Last Updated:  Tuesday, December 14, 2004 12:17:55 PM