Lecture 25 activity
Objectives
- Associate auxiliary attributes with objects
- Share data with recursive helper functions
- Modify depth-first search to implement topological sort
- Extend topological sort with filtering
Overview
Download the lec25 code and open it as a project in IntelliJ. Briefly explore the Graph
and Vertex
interfaces; you will only need a few of their methods for this activity:
Vertex::label()
: Observe the vertex’s labelVertex::successors()
: Allow iterating over other vertices linked via outgoing edgesGraph::vertices()
: Allow iterating over all vertices in a graphOopGraph.readGraph()
: Static factory method for reading directed graphs from a file.
Next, look at Main1
, which uses Vertex
’s factory method to load a graph from “courses.txt”. This graph is a simplified representation of the prerequisites of Cornell’s Computer Science courses (ignoring corequisites and equivalencies); edges point from prerequisites to the courses that depend on them. The goal of Main1
is simply to print every course as discovered by depth-first search from multiple starting vertices; later, you will make modifications to print the courses in topological order, and then to only print the courses on the prerequisite chain of the courses that you want to eventually take.
Finally, note VertexState
: it succinctly defines an enum
type for the “discovery states” that vertices go through during traversal; you can refer to a value as if it were a static field; e.g. VertexState.SETTLED
is the value to use to indicate that all of a vertex’s successors have been discovered by the traversal.
Implementing depth-first search
Depth-first search can be expressed very succinctly using recursion:
/** Settle all vertices reachable along undiscovered paths from `start`.
* Requires: `start` is undiscovered. */
dfsVisit(Vertex start) {
start.state = DISCOVERED;
for (v : start.successors()) {
if (v.state == UNDISCOVERED) {
dfsVisit(v);
}
}
start.state = SETTLED;
}
But there are some potential obstacles to implementing this as written:
- How should the starting vertex start be selected?
- What if the user wants to explore the entire graph? If the graph has disconnected components, no single start will suffice.
- The notion of “discovery state” is convenient for the algorithm, but what if the
Vertex
type doesn’t have astate
property that the algorithm can modify?
Start with the last obstacle. What data structure could the algorithm employ itself to efficiently associate each vertex with a state without modifying the Vertex
class?
Now that you’ve identified a way to track additional properties associated with each vertex, you need to share that data with each recursive invocation, and you need to create that data structure in the first place. Having each recursive call create its own copy of the data structure wouldn’t work, because then the data wouldn’t be shared between calls. This suggests that the task should be assigned to a different method, rather than our recursive dfsVisit()
. This is a common pattern, and we’ve seen it before with trees: the other method becomes a “driver,” while the recursive method serves as its helper function.
In Main1
, method dfs()
serves as the driver, while dfsVisit()
is the recursive helper. Note how the driver is responsible for looping over all possible starting vertices, resolving the concern above.
Modify the signature of dfsVisit()
so that you can provide each call with state data for all vertices.
Then construct an appropriate data structure for storing that state data at the appropriate place in the driver. Does this data structure need to be initialized with any data? If so, write the initialization; if not, explain how you can get by without it? (Hint: either answer can be correct.)
Finally, translate the pseudocode of each method into Java. Run your program to verify that all courses are printed.
Returning the vertices
Right now dfs()
prints the label of each course it visits, but it would be a more versatile function if it instead returned a list of these vertices. Since Java’s standard collection classes are mutable, this is again a situation where it is most efficient to share objects between multiple recursive calls. Modify the signatures (and specifications) of dfs()
and dfsVisit()
so that dfs()
returns a list of the vertices visited (in the same order they would have been printed). Then modify main()
to print the contents of this list to check your work.
Topological sort
By printing or appending vertices at the top of dfsVisit()
, we are processing the vertices in discovery order (that is, we do something special with a vertex before visiting its neighbors, analogous to preorder traversal in trees).
It turns out that, in order to perform a topological sort using DFS, we simply need to process vertices in reverse settled order (analogous to reverse postorder in trees).
“Settling” means processing the vertex only after discovering all of its neighbors, while the reversal can be achieved by prepending, rather than appending, to the list that is returned.
You will make these changes in Main2
, which has updated method names and specifications accordingly. Feel free to copy your solution from Main1
as a starting point.
Since you will be prepending vertices to the results list, which implementation of the List
interface should your driver construct and hand to its recursive helper? Why?
Modify your previous implementation of dfsVisit()
to satisfy the new specifications and thus help perform a topological sort. Note that a topological sort is impossible if there is a cycle in the graph, but DFS can detect cycles if it ever encounters a successor that is already discovered but not settled.
Check your results against the graph above; do courses always come somewhere after their prerequisites?
Filtering results
Most students do not take every CS class that the department offers; instead, they are interested in the topologically sorted subgraph that fulfills all of the prerequisites of the courses they want to take. You will implement this filtering in Main3
(again, feel free to copy your solution from Main2
as a starting point).
In Main3
, dfsVisit()
now returns a value. Read the specifications carefully, then modify your implementation to compute the appropriate return value. This should feel familiar to the “counting pattern” used in some recursive methods on trees.
An important case to consider is what happens when a successor is already marked as settled. This does not indicate a cycle; it simply means that an earlier search started in the middle of the dependency structure, while the current search started earlier (in a topological sense) and has now caught up with the previous results. How can you determine whether the previously searched vertex was on a path to a target?
Try out your new course planning tool! List the CS classes you are most interested in (e.g. “CS4450 CS4120”) in the program arguments and run the program, and it will show you which prerequisites to take in which order.