Discussion 11 handout

Group members (names & NetIDs)

Objectives

Note: the skills practiced here will be very useful when completing A6.

Overview

Download the dis11 code and open it as a project in IntelliJ. Briefly explore the Vertex class used to represent graph vertices. There are only three methods available to you:

Next, look at Main1, which uses Vertex’s factory method to load a graph from “courses.txt”. This graph is a simplified representation of the prerequisites of Cornell’s Computer Science courses (ignoring corequisites and equivalencies); edges point from prerequisites to the courses that depend on them. The goal of Main1 is simply to print every course as discovered by depth-first search from multiple starting vertices; later, you will make modifications to print the courses in topological order, and then to only print the courses on the prerequisite chain of the courses that you want to eventually take.

CS course dependency graph

Finally, note VertexState: it succinctly defines an enum type for the “discovery states” that vertices go through during traversal; you can refer to a value as if it were a static field; e.g. VertexState.SETTLED is the value to use to indicate that all of a vertex’s successors have been discovered by the traversal.

Depth-first search can be expressed very succinctly using recursion:

/** Settle all vertices reachable along undiscovered paths from g.
 *  Requires: g is undiscovered. */
dfsVisit(Vertex g) {
    g.state = DISCOVERED;
    for (v in g.successors) {
        if (v.state == UNDISCOVERED) {
            dfsVisit(v);
        }
    }
    g.state = SETTLED;
}

But there are some potential obstacles to implementing this as written:

  1. How should the starting vertex g be selected?
  2. What if the user wants to explore the entire graph? If the graph has disconnected components, no single g will suffice.
  3. The notion of “discovery state” is convenient for the algorithm, but what if the Vertex type doesn’t have a state property that the algorithm can modify?

Start with the last obstacle. What data structure could the algorithm employ itself to efficiently associate each vertex with a state without modifying the Vertex class?

Now that you’ve identified a way to track additional properties associated with each vertex, you need to share that data with each recursive invocation, and you need to create that data structure in the first place. Having each recursive call create its own copy of the data structure wouldn’t work, because then the data wouldn’t be shared between calls. This suggests that the task should be assigned to a different method, rather than our recursive dfsVisit(). This is a common pattern, and we’ve seen it before with trees: the other method becomes a “driver,” while the recursive method serves as its helper function.

In Main1, method dfs() serves as the driver, while dfsVisit() is the recursive helper. Note how the driver is responsble for looping over all possible starting vertices, resolving the concern above.

Modify the signature of dfsVisit() so that you can provide each call with state data for all vertices.

Then construct an appropriate data structure for storing that state data at the appropriate place in the driver. Does this data structure need to be initialized with any data? If so, write the initialization; if not, explain how you can get by without it? (Hint: either answer can be correct.)

Finally, translate the pseudocode of each method into Java. Run your program to verify that all courses are printed.

Returning the vertices

Right now dfs() prints the label of each course it visits, but it would be a more versitile function if it instead returned a list of these vertices. Since Java’s standard collection classes are mutable, this is again a situation where it is most efficient to share objects between multiple recursive calls. Modify the signatures (and specifications) of dfs() and dfsVisit() so that dfs() returns a list of the vertices visited (in the same order they would have been printed). Then modify main() to print the contents of this list to check your work.

Topological sort

By printing or appending vertices at the top of dfsVisit(), we are processing the vertices in discovery order (that is, we do something special with a vertex before visiting its neighbors, analogous to preorder traversal in trees). It turns out that, in order to perform a topological sort using DFS, we simply need to process vertices in reverse settled order (analogous to postorder in trees). “Settling” means processing the vertex only after discovering all of its neighbors, while the reversal can be achieved by prepending, rather than appending, to the list that is returned.

You will make these changes in Main2, which has updated method names and specifications accordingly. Feel free to copy your solution from Main1 as a starting point.

Since you will be prepending vertices to the results list, which implementation of the List interface should your driver construct and hand to its recursive helper? Why?

Modify your previous implementation of dfsVisit() to satisfy the new specifications and thus help perform a topological sort. Note that a topological sort is impossible if there is a cycle in the graph, but DFS can detect cycles if it ever encounters a successor that is already discovered but not settled.

Check your results against the graph above; do courses always come somewhere after their prerequisites?

Filtering results

Most students do not take every CS class that the department offers; instead, they are interested in the topologically sorted subgraph that fulfills all of the prerequisites of the courses they want to take. You will implement this filtering in Main3 (again, feel free to copy your solution from Main2 as a starting point).

In Main3, dfsVisit() now returns a value. Read the specifications carefully, then modify your implementation to compute the appropriate return value. This should feel familiar to the “counting pattern” used by some methods to process trees in A4.

An important case to consider is what happens when a successor is already marked as settled. This does not indicate a cycle; it simply means that an earlier search started in the middle of the dependency structure, while the current search started earlier (in a topological sense) and has now caught up with the previous results. How can you determine whether the previously searched vertex was on a path to a target?

Try out your new course planning tool! List the CS classes you are most interested in (e.g. “CS4450 CS4120”) in the program arguments and run the program, and it will show you which prerequisites to take in which order.

Submission

  1. Open the assignment page for “Discussion activity 11” in CMSX
  2. [Recorder] Find the “Group Management” section and invite each group member
  3. [Others] Refresh the page and accept your invitation
  4. [Recorder] Take a picture of your work and save as either a JPEG or a PDF file named “discussion_responses”. After all invitations have been accepted, upload your picture along with your code as your group’s submission.
    • Recommended scanning apps: Microsoft Office Lens, Adobe Scan, Genius Scan, Evernote Scannable

Ensure that your group is formed and your work submitted before the Friday evening deadline.

Tips and reminders