This week we're going to talk about some standard algorithms on graphs of which you need to be familiar. But first we need to talk about how to represent graphs.
A graph G normally is considered to be a pair (V,E) of a set of vertices V and a set of edges E. A very common representation of graphs is the adjacency list, which consists of an array of vertices, each of which contains a list of all adjacent vertices (in an arbitrary order). For example, we can represent the graph
[0] --- [1] | / | \ | / | [2] | / | / [4] --- [3]
as
[| [1,4], [0,4,2,3], [1,3], [1,4,2], [3,0,1] |] : int list array
(SML uses the [| |] syntax for printing arrays, but you cannot construct arrays using this syntax.) For an undirected graph, the sum of the lengths of all the adjacency lists is 2|E|, so the total amount of memory required is O(V+E).
What about directed graphs? (By the way, a tree is simply a directed acyclic graph.) Here's an example:
[0] --> [1] [2] | > | / | | / | / | v / v v v v--\ [3] <-- [4] [5] --/
For a directed graph, the sum of the lengths of the lists is |E|, so the total amount of memory required is still O(V+E).
An advantage of the adjacency list representation is that it easily can be extended to support variants of graphs. For example, how could we modify our representation to support weighted graphs? (A weighted graph is a graph for which each edge has an associated numerical weight.)
What is the running time of an algorithm to determine whether a given edge is in the graph? To compute the degree of a vertex? To add or remove an edge? To traverse the graph?
What if we stored each adjacency list as an array instead? Or stored the array of vertices as a list? These all are asymptotically equivalent in their time and space properties.
How could we store a graph to make determination of the existence of an edge in constant time possible? We could store the adjacency information in a matrix of |V|×|V| booleans indicating whether a given edge is in the graph. (Draw the adjacency matrices for the above graphs.) Adjacency matrices require O(V2) space regardless of the number of edges, so they work well for dense graphs -- graphs in which the number of edges is large compared to the number of vertices. Adjacency lists, on the other hand, work well for sparse graphs.
In terms of |V|, how big could |E| possibly become? So what is the potential space requirement of adjacency lists in terms of |V|?
Answer the same running time questions for adjacency matrices as we did for adjacency lists.
How does our representation change for directed and undirected graphs? How could we represent weighted graphs?
Both the adjacency list and the adjacency matrix are vertex-centric representations. As an alternative, we could use an edge-centric representation as we did in problem set 3: We represented graphs as a list of named edges for each node.
The most basic graph operation is to traverse a graph. The key idea in graph traversal is to mark the vertices as we visit them and to keep track of what we have not yet explored. We will consider each vertex to be in one of the states undiscovered, discovered, or completely explored. We'll start with only a single discovered vertex, and we'll consider each incident edge. If an edge connects to an undiscovered vertex, we'll mark the vertex as discovered and add it to our set of vertices to process. After we've looked at all the edges of a vertex, we'll grab another vertex from the set of discovered vertices. The order in which we explore the vertices depends on how we maintain the collection of discovered vertices.
If we use a queue, we explore the oldest unexplored vertices first. This is known as a breadth-first search, or BFS. Each vertex (except the starting vertex) is discovered during the processing of one other vertex, so this defines the breadth-first search tree. Notice that the shortest path from the root to any other vertex is a path in this tree. And for undirected graphs, nontree edges can point only to vertices on the same level as the parent vertex or to vertices on the level directly below the parent. Why?
If we use a stack to store the discovered vertices instead, we venture along a single path away from the starting vertex until there are no more undiscovered vertices in front of us. This is known as a depth-first search. Just as with breadth-first search, we can define a depth-first search tree that results from our traversal. In a depth-first search of an undirected graph, every edge is either a tree edge or a back edge; there are no forward edges or cross edges. Why?
Of course, such a traversal will only traverse a connected component; if the graph has multiple components, we need to be careful to do a DFS (or BFS) of each component.