Dijkstra's single-source shortest path algorithm

Topics:

The single-source shortest path problem
Dijkstra's algorithm
Proving correctness of the algorithm
Extensions: generalizing distance
Extensions: A*

The single-source shortest path problem

Suppose we have a weighted directed graph, and we want to find the path between two vertices with the minimum total weight. Interpreting edge weights as distances, this is a shortest-path problem. In the example illustrated, there is a path with total weight 50, but may not be easy to find it. We could find the shortest path by enumerating all possible permutations of the vertices and checking which one, but there are O(V!) such permutations.

Finding shortest paths in graphs is very useful. We have already seen an algorithm for finding shortest paths when all weights are 1, namely breadth-first search. BFS finds shortest paths from a root node or nodes to all other nodes when the distance along a path is simply the number of edges. In general it is useful to have edges with different “distances”, so shortest-path problems are expressed in terms of weighted graphs, where the weights represent distances. (Weighted undirected graphs can be regarded as weighted directed graphs with directed edges of equal weight in each direction.)

Both breadth- and depth-first search are instances of the abstract tricolor algorithm. The tricolor algorithm is nondeterministic: it allows a range of possibilities for each next step. BFS and DFS resolve the nondeterminism in different ways and visit nodes in different orders. In this lecture we will see another instance of the tricolor algorithm due to Edsger Dijkstra, which solves the single-source shortest path problem. The algorithm is known as Dijkstra's algorithm.

For simplicity, we will assume that all edge weights are nonnegative. The Bellman-Ford algorithm is a generalization of Dijkstra's algorithm that can be used in the presence of negative weights, provided there are no negative-weight cycles. Minimum distance doesn't make sense in graphs with negative-weight cycles, because one could traverse a negative-weight cycle an arbitrary number of times, and there would be no lower bound on the weight.

We will solve the problem of finding the shortest path from a given root node to all other nodes. To find a path from multiple root nodes, the single-source shortest path algorithm can be used repeatedly, once for each starting node; but if the graph is dense, it is more efficient to use a different algorithm for solving the all-pairs shortest-path problem. The Floyd-Warshall algorithm is the standard algorithm for this task and is covered in CS 3110 or CS 4820.

Dijkstra's algorithm

Dijkstra's algorithm is a generalization of BFS. Like BFS, it visits nodes in order of distance from the root, although in this case “distance” is defined more generally as the length of a shortest path, where “length” and “shortest” refer to the sum of the edge weights along the path.

Also like BFS, Dijkstra's algorithm is an instance of the tricolor algorithm. Recall that the tricolor algorithm changes nodes from white to gray and from gray to black, never allowing an edge from a black node to a white node. In Dijkstra's algorithm, the nodes are not discovered (turned from white to gray) in distance order, but they are finished (turned from gray to black) in distance order.

The progress of the algorithm is depicted by the following figure. Initially the root is gray and all other nodes are white. As the algorithm progresses, reachable white nodes turn into gray nodes when they are first discovered, and these gray nodes eventually turn into black nodes. Eventually there are no gray nodes left and the algorithm is done.

Like BFS and the iterative version of DFS, Dijkstra's algorithm can be implemented as a worklist algorithm. Recall that in BFS and DFS, the worklist is a queue and a stack, respectively. In Dijkstra's algorithm, the worklist is a priority queue, which we will explain shortly. The three colors represent the following situations:

White nodes are undiscovered. Their distance field is set to ∞ because no path to them has been found yet. (v.dist = ∞)
Gray nodes are frontier nodes. They are on the priority queue, which we call frontier. Their distance field is finite and is set to the distance along the shortest known path from the root to that node. (v.dist < ∞, v ∈ frontier)
Black nodes are finished nodes. The value of their distance field is the true minimum distance from the root, and they are not on the priority queue. (v.dist < ∞, v ∉ frontier)

Dijkstra's algorithm starts with the root in the priority queue at distance 0 (v.dist = 0), so it is gray. All other nodes are at distance ∞ and are white. The algorithm then works roughly like BFS, except that when node is popped from the priority queue, the node with the smallest v.dist value among all those on the queue is popped. When the algorithm completes, all reachable nodes are black and hold their true minimum distance.

Here is the pseudo-code:

1    frontier = new PriorityQueue();
2    root.dist = 0;
3    fronter.push(root);

4    while (frontier not empty) {
5       v = frontier.pop();  // extracts element with smallest v.dist on the queue
6       for ((v,w) in E) {
7          if (w.dist == ∞) {
8             w.dist = v.dist + weight(v,w);
9             frontier.push(w);      // color w gray
10         } else {
11            w.dist = min(w.dist, v.dist + weight(v,w));
12              // w was already gray, but update to shorter distance
13            frontier.adjust_priority(w);
14         }
15      }
16      // color v black
17   }

Dijkstra's algorithm is an example of a greedy algorithm: an algorithm in which simple local choices (like choosing the gray vertex with the smallest v.dist) lead to global optimal results (minimal distances computed to all vertices). Many other problems, such as finding a minimum spanning tree, can be solved by greedy algorithms. Other problems, such as Traveling Salesperson, cannot.

Example

Let us run the algorithm on the graph above with v₀ as the root. In the first loop interation, we pop v₀ and push its three successors, updating their distances accordingly. The root v₀ turns black.

In the second iteration, we pop the node with the lowest distance (9) and add its successor to the queue at distance 9+23=32. The new gray frontier consists of the three nodes shown. The node at distance 9 turns black; 9 is the shortest distance to that node.

After the second iteration

The next node to come off the queue is the one at distance 14. Inspecting its successors, we discover a new white vertex and set its distance to 44=14+30. We also discover a new path of length 31=14+17 to a node already on the frontier, which is shorter than the previous best known path of length 32, so we adjust its distance.

After the third iteration

We continue in this fashion until the queue becomes empty. At that point, all reachable nodes are black, and we have computed the minimum distance to each of them.

Recovering the shortest path

We have shown how to calculate the minimum distance from the root to all reachable nodes, but how do we recover the actual shortest paths? A slight modification of the algorithm does the trick. Whenever the w.dist value decreases in line 8 or 11, we save a back-pointer from w to v, as the shortest known path from the root to w at that point goes through v. When the algorithm is finished, we can follow these back pointers from any node along a shortest path back to the root.

Another approach is as follows. Observe that along any shortest path containing an edge (v,w) of weight d, w.dist = v.dist + d. If that were not true, there would have to be another shorter path. We can't have w.dist > v.dist + d because there is clearly a path to w with length v.dist + d. And we can't have w.dist < v.dist + d, because that means the original path through (v,w) was not a shortest path. We can find shortest paths by working backward from any vertex, at each step finding a predecessor satisfying this equation.

The thick edges in the previous figure show the shortest paths from the root v₀ to every other node. They form a spanning tree that includes every vertex in the graph and shows the minimum distance to each of them.

Performance of Dijkstra's algorithm

The algorithm does some work per vertex and some work per edge. The outer loop happens once for each vertex. The loop uses the priority queue to retrieve the vertex of minimum distance among the vertices on the queue, and there can be up to n = |V| vertices on the queue. If we implement the priority queue with a balanced binary tree, we can retrieve the minimum vertex in O(log n) time. (We cannot hope to do much better than this, otherwise we could use a priority queue to sort faster than the known lower bound of Ω(n log n) for comparison sorting.)

We also do work proportional to the number of edges m = |E|; in the worst case every edge will cause the distance to its sink vertex to be adjusted. The priority queue needs to be informed of the new distance. This will take O(log n) time per edge if we use a balanced binary tree. So the total time taken by the algorithm is O(n log n) + O(m log n). Since the number of reachable vertices is O(m), this is O(m log n).

It turns out there is a way to implement a priority queue as a data structure called a Fibonacci heap, which makes the operation of updating the distance O(1). The total time is then O(n log n + m). However, Fibonacci heaps have poor constant factors, so they are rarely used in practice. They only make sense for very large, dense graphs where m is not O(n).

Correctness of Dijkstra's algorithm

To show this algorithm works as advertised, we will need a loop invariant.

At any stage of the algorithm, define an interior path to a node v to be a path from the root to v that goes through only black vertices, except possibly for v itself.

The loop invariant for the outer loop is then the usual tricolor black–white invariant, plus the following:

For every black vertex v and gray vertex w, v.dist ≤ w.dist.
For every gray or black vertex v, v.dist is the length of the shortest interior path to v.

Initialization. Part 1 of the invariant is vacuously true initially, because there are no black vertices. Part 2 is true initially, because the root is the only gray or black vertex, and root.dist is 0, which is the length of the shortest path from the root to itself.

Postcondition. When the loop finishes, all paths are interior paths, so by part 2 of the invariant, each v.dist is the length of the shortest path to v.

Preservation. Suppose the invariant holds at the beginning of the loop. We consider each of the two parts of the invariant in turn.

Each iteration of the loop first colors the minimum-distance gray node v black. This preserves part 1, since the new black node was minimum over all gray nodes. Some new white successors of v will now be colored gray, but their new distances will all be greater than or equal to v.dist too. So the first part of the invariant holds. We can see from this that vertices are moved from gray to black in order of nondecreasing distance.
We need to show that we have the correct interior path distance to every gray or black vertex. Coloring v black doesn't create new interior paths to it, but it might create new paths to other nodes.

We don't have to worry about new interior paths to any nodes that are not successors to v, because any such interior path must have as its second-to-last step a different black node than v, one that is already at a closer distance than v. So going through v cannot produce a shorter interior path.

However, we still need to think about the interior paths to successors to v. Consider an arbitrary successor w encountered in the inner loop. It can have one of three colors:
- Black. By part 1 of the invariant, any black successor w must have w.dist ≤ v.dist, so there can be no shorter interior path via v.
- White. By the tricolor invariant, there is no preexisting interior path to w. Therefore the new path via v with distance v.dist + d is the only interior path.
- Gray. If the node is already on the frontier, there is an existing interior path whose length is recorded in w.dist. In addition, there is a new interior path to w via v, with total distance v.dist. The code compares these two paths and picks the shorter one. Coloring v black can also create other new interior paths that don't go directly to w via the new edge, but instead go via some other black node x. By invariant part 1, x.dist ≤ v.dist, so any such interior path must be at least as long as the preexisting interior path that goes from the roots directly to x and then to w, and we know by invariant part 2 that that path is no shorter than w.dist.

Generalizing shortest paths

The weights in a shortest path problem need not represent distance, of course. They could represent time, for example. They could also represent probabilities. For example, suppose that we had a state machine that transitioned to new states with some given probability on each outgoing edge. The probability of taking a particular path through the graph would then be the product of the probabilities on edges along that path. We can then answer questions such as, “What is the most likely path that the system will follow in order to arrive at a given state?” by solving a shortest path problem in a weighted graph in which the weights are the negative logarithms of the probabilities, since (−log a) + (−log b) = −log ab.

Heuristic search (A*)

Dijkstra's algorithm is a simple and efficient way to find the shortest path from a root node to all other nodes in a graph. However, if we are only interested in the shortest path to a particular vertex, it may search much more of the graph than necessary. One simple optimization is to stop the algorithm at the point when the destination vertex is popped off the priority queue. We know that v.dist is the minimal distance to v at that point because it will be colored black.

A second and more interesting optimization is to take advantage of more knowledge of distances in the graph. Dijkstra's algorithm searches outward in distance order from the source node and finds the best route to every node in the graph. In general we don't want to know every best route. For example, if we are trying to go from NYC to Mount Rushmore, we don't need to explore the streets of Miami, but that is what Dijkstra's algorithm will do.

The A* search algorithm generalizes Dijkstra's algorithm to to take advantage of knowledge about the node we are trying to find. The idea is that for any given vertex, we may be able to estimate the remaining distance to the destination. We define a heuristic function h(v) that constructs this estimate. Then, the priority queue uses v.dist + h(v) as the priority instead of just v.dist. Depending on how accurate the heuristic function is, this change causes the search to focus on nodes that look like they are in the right direction.

Adding the heuristic to the node distance effectively changes the weights of edges. Given an edge from vertex v to w with weight d, the change in priority along the edge is d + h(w) − h(v). If h(v) is a perfect estimator of remaining distance, this quantity will be zero along the optimum path; in that case we can trivially "search" for the optimum path by simply following such zero-weight edges. More realistically, h(v) will be inaccurate. If it is inaccurate by up to an amount ε, we end up searching around the optimum path in a region whose size is determined by ε.

However, we have to be careful when defining h(v). If for an edge (v,w) of weight d, the total distance including the heuristic function decreases along the edge, the conditions of Dijkstra's algorithm (nonnegative edge weights) are no longer met. In that case we have an inadmissible heuristic that could cause us to miss the optimal solution. A heuristic function is admissible if it never overestimates the true remaining distance. When searching with an admissible heuristic, each node is seen only once and the optimum path has been found immediately when the goal node is reached.

If using an inadmissible heuristic, negative-weight edges might cause us to find a new path to an already “finished” node, pushing it back on to the frontier queue. Even after finding the goal node at some distance D, It is necessary to keep searching above priority level D to account for these negative-weight edges. Despite this issue, the reason we might want to use an inadmissible heuristic is that it can be more accurate than an admissible heuristic, and thus guide the search more effectively to the goal. In some cases inadmissible heuristics can pay off.

The A* search algorithm is often used for single-player game search and various other optimization problems.