Discussion 5 handout

Group members (names & NetIDs)

Objectives

Prepration: Demo code and example data

Please download dis05-release.zip, extract it to a known location on your computer, and open it as a project in IDEA.

Task: Text file index

Your goal is to create a reverse index of a text file. For each line in the file, you want to keep track of the distinct words that occur on that line. Then using this index, you can report which lines a given word occurs on.

This problem can be solved with fairly little code by leveraging Java’s built-in implementations of appropriate ADTs. By solving the problem in terms of ADT operations (rather than jumping straight to code), it is easier to verify that your solution will work, and the same algorithm can readily be implemented in multiple languages.

Identify ADTs

The primary operation of our index is to report which distinct words appear on a given line. What would be an appropriate return type for this query?

Its secondary operation is to report which line numbers a given word occurs on. What would be an appropriate return type for this query?

Select a combination of ADTs (from those supported by Java’s collections framework) that would be suitable for storing the information needed to create the index. Specify their generic types.

Construct the index

In “Index.java”, declare fields of the types you selected above. Initialize these fields by constructing instances of appropriate classes from the Java collections framework.

Implement the constructor for Index to populate these fields.

Query the index

Read the specification for wordsOneLine(), then implement it using your fields. Note the word “creates” in the spec. Depending on your choice of fields, it might be possible to return an existing collection from your index’s state instead of creating a new one; why do you think creating a new collection is important?

Read the specification for `linesWithWord(), then implement it using your fields.

Look at main(), which constructs an index for the included file “Austen.txt”. Add additional code to answer the following questions about the resulting index and check your results with a neighboring group or with a consultant:

  1. How many dictinct words appear on line 18?

  2. Does the word “young” appear on line 13?

  3. How many different lines does the word “with” appear on?

Submission

  1. Open the assignment page for “Discussion activity 5” in CMSX
  2. [Recorder] Find the “Group Management” section and invite each group member
  3. [Others] Refresh the page and accept your invitation
  4. [Recorder] Take a picture of your work and save as either a JPEG or a PDF file named “discussion_responses” (you do not need to submit your test code). After all invitations have been accepted, upload your picture along with your code as your group’s submission.
    • Recommended scanning apps: Microsoft Office Lens, Adobe Scan, Genius Scan, Evernote Scannable

Ensure that your group is formed and your work submitted before the Friday evening deadline.

Tips and reminders