The lecture notes on graph traversals will be a useful reference throughout this activity. Note: the skills practiced here will be very useful when completing A7.
Download the Lab 12 code and open it as a project in IDEA. Briefly explore the Node
class used to represent graph vertices. There are only three methods available to you:
label()
: Observe the node’s labelsuccessors()
: Allow iterating over other nodes linked via outgoing edgesreadGraph()
: Static factory method for reading directed graphs from a file. The returned representation is an adjacency list.Next, look at Main1
, which uses Node
’s factory method to load a graph from “courses.txt”. This graph is a simplified representation of the prerequisites of Cornell’s Computer Science courses (ignoring corequisites and equivalencies); edges point from prerequisites to the courses that depend on them. The goal of Main1
is simply to print every course as discovered by depth-first search from multiple starting nodes; later, you will make modifications to print the courses in topological order, and then to only print the courses on the prerequisite chain of the courses that you want to eventually take.
Finally, note Color
: it succinctly defines an enum
type for the colors used in the tricolor theorem; you can refer to a value as if it were a static field; e.g. Color.GRAY
.
Depth-first search can be expressed very succinctly using recursion:
/** Color black all nodes reachable on paths from g where all
* nodes (except g) on each such path are white.
* Requires: g is gray. */
dfsVisit(Node g) {
for (v in g.successors) {
if (v.color is white) {
v.color = gray;
dfsVisit(v);
}
}
g.color = black;
}
But there are some potential obstacles to implementing this as written:
Node
type doesn't have a color
property that the algorithm can modify?Start with the last obstacle. What data structure could the algorithm employ to efficiently associate each node with a color without modifying the Node
class?
Now that you’ve identified a way to track additional properties associated with each node, you need to share that data with each recursive invocation, and you need to create that data structure in the first place. Having each recursive call create its own copy of the data structure wouldn’t work, because then the data wouldn't be shared between calls. This suggests that the task should be assigned to a different method, rather than our recursive dfsVisit()
. This is a common pattern: the other method becomes a “driver,” while the recursive method serves as a helper function.
In Main1
, method dfs()
serves as the driver, while dfsVisit()
is the recursive helper. Note how the driver is responsble for looping over all possible starting nodes and coloring them gray if they still need to be visited, resolving the concerns above.
Modify the signature of dfsVisit()
so that you can provide each call with color data for all nodes.
Then construct an appropriate data structure for storing that color data at the appropriate place in the driver. Does this data structure need to be initialized with any data? If so, write the initialization; if not, explain how you can get by without it? (Hint: either answer can be correct.)
Finally, translate the pseudocode of each method into Java. Run your program to verify that all courses are printed.
Right now dfs()
prints the label of each course it visits, but it would be a more versitile function if it instead returned a list of these nodes. Since Java’s standard collection classes are mutable, this is again a situation where it is most efficient to share objects between multiple recursive calls. Modify the signatures (and specifications) of dfs()
and dfsVisit()
so that dfs()
returns a list of the nodes visited (in the same order they would have been printed). Then modify main()
to print the contents of this list to check your work.
By printing or appending nodes at the top of dfsVisit()
, we are processing the nodes in preorder (that is, we do something special with a node before visiting its neighbors). It turns out that, in order to perform a topological sort using DFS, we simply need to process nodes in reverse postorder. “Postorder” means processing the node after visiting its neighbors, while the reversal can be achieved by prepending, rather than appending, to the list that is returned.
You will make these changes in Main2
, which has updated method names and specifications accordingly. Feel free to copy your solution from Main1
as a starting point.
Since you will be prepending nodes to the results list, which implementation of the List
interface should your driver construct and hand to its recursive helper? Why?
Modify your previous implementation of dfsVisit()
to satisfy the new specifications and thus help perform a topological sort. Note that a topological sort is impossible if there is a cycle in the graph, but DFS can detect cycles if it ever encounters a successor that is already gray (to comply with the “checks” clause, you should throw a runtime exception or fail an assert in this case). Check the results against the graph above; do courses always come somewhere after their prerequisites?
Most students do not take every CS class that the department offers; instead, they are interested in the topologically sorted subgraph that fulfills all of the prerequisites of the courses they want to take. You will implement this filtering in Main3
(again, feel free to copy your solution from Main2
as a starting point).
In Main3
, dfsVisit()
now returns a value. Read the specifications carefully, then modify your implementation to compute the appropriate return value. This should feel familiar to the “counting pattern” used by some methods to process trees in A4.
An important case to consider is what happens when a successor is already colored black. This does not indicate a cycle; it simply means that an earlier search started in the middle of the dependency structure, while the current search started earlier (in a topological sense) and has now caught up with the previous results. How can you determine whether the previously searched node was on a path to a target?
Try out your new course planning tool! List the CS classes you are most interested in (e.g. “CS4450 CS4120”) in the program arguments and run the program, and it will show you which prerequisites to take in which order.
Ensure that your group is formed and your work submitted before the Friday evening deadline.