Today we're going to talk about some standard algorithms on graphs of which you need to be familiar. But first we need to talk about how to represent graphs.
A graph G normally is considered to be a pair (V,E) of a set of vertices V and a set of edges E. A very common representation of graphs is the adjacency list, which consists of an array of vertices, each of which contains a list of all adjacent vertices (in an arbitrary order). For example, we can represent the graph
[0] --- [1] | / | \ | / | [2] | / | / [4] --- [3]
as
[| [1,4], [0,4,2,3], [1,3], [1,4,2], [3,0,1] |] : int list array
(SML uses the [| |] syntax for printing arrays, but you cannot construct arrays using this syntax.) For an undirected graph, the sum of the lengths of all the adjacency lists is 2|E|, so the total amount of memory required is O(V+E).
What about directed graphs? (By the way, a tree is simply a directed acyclic graph.) Here's an example:
[0] -->[1] [2] | -> | / | | / | / | | / | / v /--\ v/ vv vv | [3] <--[4] [5] --/
For a directed graph, the sum of the lengths of the lists is |E|, so the total amount of memory required is still O(V+E).
The adjacency list can also be stored as a list of references instead of an array. This will make the addition of new nodes more efficient, allowing us to add one node without changing the structure that represents the rest of the graph. References will allow changing the outgoing edges of a node without changing the rest of the structure.
The representation can be further extended to support weighted graphs. (A weighted graph is a graph for which each edge has an associated numerical weight.) For this, we change the list of outgoing edges to a list of pairs of outgoing edges and their weights.
Here is a signature and a structure that implements a weighted graph using an adjacency list:
(* A signature for directed graphs.*) signature WGRAPH = sig type graph (* A directed graph comprising a set of vertices * V and directed edges E with nonnegative weights. *) type vertex (* A vertex, or node, of the graph *) type edge (* A edge of the graph *) (* eq(v1,v2) is true iff v1 and v2 are the same vertex. *) val eq: vertex * vertex -> bool (* All vertices in the graph, without duplicates. * Run time: O(|V|). *) val vertices: graph -> vertex list (* outgoing(v) is a list of the edges leaving the vertex v. * Run time: linear in the length of the result. *) val outgoing: vertex -> edge list (* edgeinfo(e) is (src,dst,w) where src is the source of * the edge e, dst is its destination, and w is its weight *) val edgeinfo: edge -> vertex * vertex * int (* add_vertex(g) is (g',v) where g' contains the same vertices * and edges as g, plus a new vertex v with no outgoing edges. *) val add_vertex: graph -> graph*vertex (* Effects: add_edge(src,dst,w) adds an edge from src to dst, * with weight w. *) val add_edge: vertex * vertex * int -> unit = end
structure Graph : WGRAPH = struct datatype vertex = V of (vertex*int) list ref type edge = vertex*vertex*int type graph = vertex list fun eq(V(v1), V(v2)) = (v1 = v2) fun vertices(g) = g fun edgeinfo(e) = e fun outgoing(V(lr)) = map (fn(dst,w) => (V(lr), dst, w)) (!lr) fun add_vertex(g: graph): graph*vertex = let val v = V(ref []) in (v::g, v) end fun add_edge(src: vertex, dst: vertex, weight: int) = case src of V(lr) => lr := (dst,weight)::(!lr) end
How could we store a graph to make determination of the existence of an edge in constant time possible? We could store the adjacency information in a matrix of |V|×|V| booleans indicating whether a given edge is in the graph. (Draw the adjacency matrices for the above graphs.) Adjacency matrices require O(V2) space regardless of the number of edges, so they work well for dense graphs -- graphs in which the number of edges is large compared to the number of vertices. Adjacency lists, on the other hand, work well for sparse graphs.
In terms of |V|, how big could |E| possibly become? So what is the potential space requirement of adjacency lists in terms of |V|?
Answer the same running time questions for adjacency matrices as we did for adjacency lists.
How does our representation change for directed and undirected graphs? How could we represent weighted graphs?
Both the adjacency list and the adjacency matrix are vertex-centric representations. As an alternative, we could use an edge-centric representation: represent graphs using a list of named edges for each node. For directed graphs, one can represent outgoing and/or incoming edges for each node. Note that incoming edges can be inferred from the outgoing edges, but one needs to explore the whole data structure to figure out the incoming edges of a single node.
The most basic graph operation is to traverse a graph. The key idea in graph traversal is to mark the vertices as we visit them and to keep track of what we have not yet explored. We will consider each vertex to be in one of the states undiscovered, discovered, or completely explored. We'll start with only a single discovered vertex, and we'll consider each incident edge. If an edge connects to an undiscovered vertex, we'll mark the vertex as discovered and add it to our set of vertices to process. After we've looked at all the edges of a vertex, we'll grab another vertex from the set of discovered vertices. The order in which we explore the vertices depends on how we maintain the collection of discovered vertices.
(* Simple graph traversal (BFS or DFS) *) let val disc = new_collection() (* queue for BFS, stack for DFS *) val visited = new_set() (* Expand the visited set to contain everything v goes to, * and add newly seen vertices to the queue. *) fun expand(v: vertex) = let fun handle_edge(e: edge): unit = let val (v, v') = Graph.edgeInfo(e) in if not (member(visited,v')) then ( add(visited, v'); insert(disc, v') ) else () end in app handle_edge (Graph.outgoing(v)) end in add(visited, v0); insert(disc, v0); while (not (empty_queue(disc))) do expand(remove(disc)) end
If we use a queue dor disc (the set of discovered edges) , we explore the oldest unexplored vertices first. This is known as a breadth-first search, or BFS. Each vertex (except the starting vertex) is discovered during the processing of one other vertex, so this defines the breadth-first search tree. Notice that the shortest path from the root to any other vertex is a path in this tree. And for undirected graphs, nontree edges can point only to vertices on the same level as the parent vertex or to vertices on the level directly below the parent. Why?
If we use a stack to store the discovered vertices instead, we venture along a single path away from the starting vertex until there are no more undiscovered vertices in front of us. This is known as a depth-first search. Just as with breadth-first search, we can define a depth-first search tree that results from our traversal. In a depth-first search of an undirected graph, every edge is either a tree edge or a back edge; there are no forward edges or cross edges. Why?
One of the most important and useful algorithms is Dijkstra's shortest path algorithm, a greedy algorithm that efficiently finds shortest paths in a graph. (About pronunciation: "Dijkstra" is Dutch and starts out like "dike").
Because graphs are able to represent many things, many problems can be cast as shortest-path problems, making this algorithm a powerful and general tool. For example, Dijkstra's algorithm is a good way to implement a service like MapQuest, which finds the shortest way to drive between two points on the map. It can also be used to solve problems like network routing, where the goal is to find the shortest path for data packets to take through a switching network. It is also used in more general search algorithms for a variety of problems ranging from automated circuit layout to speech recognition.
A path through the graph is a sequence (v1, ..., vn) such that the graph contains an edge e1 going from v1 to v2, an edge e2 going from v2 to v3, and so on. That is, all the edges must be traversed in the forward direction. The length of a path in a weighted graph is the sum of the weights along these edges e1, ..., en−1. We call this property "length" even though for some graphs it may represent some other quantity: for example, money or time.
Let's consider the problem of single-source shortest path problem for an
unweighted directed graph. In this case we are trying to find the smallest
number of edges that must be traversed in order to get to every vertex in the
graph. This is the same problem as solving the weighted version where all the
weights happen to be 1. Do we know an algorithm for determining this? Yes: the
breadth-first search algorithm above. We augment the visited set to keep track
of the number of edges traversed from v0;
it becomes a map dist
from vertices to edge counts(ints), possibly
implemented as a hash table (an even more imperative alternative is to store
distances directly in the vertices). The only algorithmic modification needed is
in expand
, which adds to the frontier a newly found vertex at a
distance one greater than that of its neighbor already in the frontier:
(* unweighted single-source shortest path *) let val q = new_queue() (* a map from visited vertices to their distances *) val dist = new_map() fun expand(v: vertex) = let val d: int = valOf(get(dist, v)) fun handle_edge(e: edge) = let val (v, v') = Graph.edgeinfo(e) in case get(dist, v') of SOME(d') => () (* note: d' <= d+1 *) | NONE => ( add(dist, v', d+1); enqueue(q, v') ) end in app handle_edge (Graph.outgoing(v)) end in add(visited, v0, 0); enqueue(q, v0); while (not (empty_queue(q))) do expand(dequeue(q)) end
Now we can generalize to the problem of computing the shortest path between
two vertices in a weighted graph. We can solve this problem by making minor
modifications to the BFS algorithm for shortest paths in unweighted graphs. As
in that algorithm, we keep a visited map that maps vertices to their distances
from the source vertex v0. We change expand
so that
instead of adding 1 to the distance, its adds the weight of the edge traversed.
There are also two additional changes. First, the queue must be replaced by a
priority queue. where the priorities of the vertices in the queue are
their distances recorded in visited. That is, dequeue(q)
should be
a priority queue extract_min
operation that removes the vertex with
the smallest distance. Second, whenever we explore a path that leads to a vertex
v' that has been reached before, we need to check whether the new path to v' is
an improvement over the old one. This is done by the if statement below.
The algorithm is as follows:
(* Dijkstra's Algorithm *) let fun ordering((v1,d1), (v2,d2)) = Int.compare(d1,d2) val q: (vertex*int) prio_queue = new_prio_queue(ordering) val dist: VertexMap.map = VertexMap.create() fun expand(v: vertex, d: int) = let fun handle_edge(e: edge) = let val (_, v', w) = Graph.edgeinfo(e) in case get(dist, v') of SOME(d') => if d+w < d' then (add(dist, v', d+w); incr_priority(q, v', d+w) ) else () | NONE => (add(dist, v', d+w); insert(q, v', d+w) ) end in app handle_edge (Graph.outgoing(v)) end in add(visited, v0, 0); insert(q, (v0, 0)); while (not (empty_queue(q))) do expand(extract_min(q)) end