Discussion 12 handout

Objectives

Add auxiliary properties to objects
Share data with recursive helper functions
Modify depth-first search to implement topological sort
Extend topological sort with filtering

The lecture notes on graph traversals will be a useful reference throughout this activity. Note: the skills practiced here will be very useful when completing A7.

Overview

Download the Lab 12 code and open it as a project in IDEA. Briefly explore the Node class used to represent graph vertices. There are only three methods available to you:

label(): Observe the node’s label
successors(): Allow iterating over other nodes linked via outgoing edges
readGraph(): Static factory method for reading directed graphs from a file. The returned representation is an adjacency list.

Next, look at Main1, which uses Node’s factory method to load a graph from “courses.txt”. This graph is a simplified representation of the prerequisites of Cornell’s Computer Science courses (ignoring corequisites and equivalencies); edges point from prerequisites to the courses that depend on them. The goal of Main1 is simply to print every course as discovered by depth-first search from multiple starting nodes; later, you will make modifications to print the courses in topological order, and then to only print the courses on the prerequisite chain of the courses that you want to eventually take.

Finally, note Color: it succinctly defines an enum type for the colors used in the tricolor theorem; you can refer to a value as if it were a static field; e.g. Color.GRAY.

Implementing depth-first search

Depth-first search can be expressed very succinctly using recursion:

/** Color black all nodes reachable on paths from g where all
 *  nodes (except g) on each such path are white.
 *  Requires: g is gray. */
dfsVisit(Node g) {
    for (v in g.successors) {
        if (v.color is white) {
            v.color = gray;
            dfsVisit(v);
        }
    }
    g.color = black;
}

But there are some potential obstacles to implementing this as written:

How should the starting node g be selected, and who is responsible for coloring it gray to satisfy the precondition?
What if the user wants to explore the entire graph? If the graph has disconnected components, no single g will suffice.
The notion of color is convenient for the algorithm, but what if the Node type doesn't have a color property that the algorithm can modify?

Start with the last obstacle. What data structure could the algorithm employ to efficiently associate each node with a color without modifying the Node class?

Now that you’ve identified a way to track additional properties associated with each node, you need to share that data with each recursive invocation, and you need to create that data structure in the first place. Having each recursive call create its own copy of the data structure wouldn’t work, because then the data wouldn't be shared between calls. This suggests that the task should be assigned to a different method, rather than our recursive dfsVisit(). This is a common pattern: the other method becomes a “driver,” while the recursive method serves as a helper function.

In Main1, method dfs() serves as the driver, while dfsVisit() is the recursive helper. Note how the driver is responsble for looping over all possible starting nodes and coloring them gray if they still need to be visited, resolving the concerns above.

Modify the signature of dfsVisit() so that you can provide each call with color data for all nodes.

Then construct an appropriate data structure for storing that color data at the appropriate place in the driver. Does this data structure need to be initialized with any data? If so, write the initialization; if not, explain how you can get by without it? (Hint: either answer can be correct.)

Finally, translate the pseudocode of each method into Java. Run your program to verify that all courses are printed.

Returning the nodes

Right now dfs() prints the label of each course it visits, but it would be a more versitile function if it instead returned a list of these nodes. Since Java’s standard collection classes are mutable, this is again a situation where it is most efficient to share objects between multiple recursive calls. Modify the signatures (and specifications) of dfs() and dfsVisit() so that dfs() returns a list of the nodes visited (in the same order they would have been printed). Then modify main() to print the contents of this list to check your work.

Topological sort

By printing or appending nodes at the top of dfsVisit(), we are processing the nodes in preorder (that is, we do something special with a node before visiting its neighbors). It turns out that, in order to perform a topological sort using DFS, we simply need to process nodes in reverse postorder. “Postorder” means processing the node after visiting its neighbors, while the reversal can be achieved by prepending, rather than appending, to the list that is returned.

You will make these changes in Main2, which has updated method names and specifications accordingly. Feel free to copy your solution from Main1 as a starting point.

Since you will be prepending nodes to the results list, which implementation of the List interface should your driver construct and hand to its recursive helper? Why?

Modify your previous implementation of dfsVisit() to satisfy the new specifications and thus help perform a topological sort. Note that a topological sort is impossible if there is a cycle in the graph, but DFS can detect cycles if it ever encounters a successor that is already gray (to comply with the “checks” clause, you should throw a runtime exception or fail an assert in this case). Check the results against the graph above; do courses always come somewhere after their prerequisites?

Filtering results

Most students do not take every CS class that the department offers; instead, they are interested in the topologically sorted subgraph that fulfills all of the prerequisites of the courses they want to take. You will implement this filtering in Main3 (again, feel free to copy your solution from Main2 as a starting point).

In Main3, dfsVisit() now returns a value. Read the specifications carefully, then modify your implementation to compute the appropriate return value. This should feel familiar to the “counting pattern” used by some methods to process trees in A4.

An important case to consider is what happens when a successor is already colored black. This does not indicate a cycle; it simply means that an earlier search started in the middle of the dependency structure, while the current search started earlier (in a topological sense) and has now caught up with the previous results. How can you determine whether the previously searched node was on a path to a target?

Try out your new course planning tool! List the CS classes you are most interested in (e.g. “CS4450 CS4120”) in the program arguments and run the program, and it will show you which prerequisites to take in which order.

Submission

Open the assignment page for “Discussion activity 12” in CMSX
[Recorder] Find the “Group Management” section and invite each group member
[Others] Refresh the page and accept your invitation
[Recorder] Take a picture of your work and save as either a JPEG or a PDF file named “discussion_responses”. After all invitations have been accepted, upload your picture, as well as “Main2.java”, as your group’s submission.
- Recommended scanning apps: Microsoft Office Lens, Adobe Scan, Genius Scan, Evernote Scannable

Ensure that your group is formed and your work submitted before the Friday evening deadline.

Tips and reminders

Discussion is not a race. Focus on the current activity being facilitated by your TA and engage your whole group to propose and explain ideas.
Elect a recorder to maintain the “master” copy of your work (others are still encouraged to jot down ideas on scratch paper). Rotate this position each week.
It is a violation of academic integrity to credit someone for work if they were not present when the work was done, and the whole group is accountable. Your CMS group must only include classmates who attended section with you on that day. Remember that our participation policies accommodate occasional absences without penalty.
It is your individual responsibility to ensure that you are grouped on CMS and that your group’s work is submitted before the deadline. Log into CMS the day after section to ensure that everything is in order, and contact your groupmates if not. It would be prudent for multiple members to photograph the group’s work.
Only one group member (typically the recorder) needs to upload a submission. But their submission must not be uploaded until after all group members have confirmed their membership in CMS (contact your TA if you have trouble grouping in CMS).