Priority Queues and Heaps

For Dijkstra's shortest-path algorithm, we needed a priority queue: a queue in which each element has a priority and from which elements are removed in priority order; that is, the next element to be removed from the queue is always the element of highest priority currently in the queue. A priority queue is an abstraction with several important uses, and it can be implemented efficiently, as we will see.

We have already seen that priority queues are useful for implementing Dijkstra's algorithm and A* search. In these applications, the priority is the best-known estimate of a shortest distance. Such a queue is called a min-queue because the smaller the distance, the higher the priority. There are also max-queues in which larger numbers correspond to higher priorities.

Another application in which priority queues are very useful is event-driven simulation. The events need to be processed in the order in which they occur, thus a min-queue is used to store unprocessed events, where the priority is a timestamp indicating the time of occurrence. Handling one event can generate new future events, which get added to the queue.

Another use for priority queues is for the compression algorithm known as Huffman coding, an optimal way to compress individual symbols in a stream. Priority queues can also be used for sorting, since elements to be sorted can be pushed into the priority queue and then removed in sorted order.

Priority queue interface

A priority queue can be described via the following interface for a min-queue:

PriorityQueue.java

The methods described in this interface suffice to implement Dijkstra's shortest path algorithm.

To implement increasePriority(), an implementation must have a fast way to find the element whose priority is to be updated. This can be accomplished by using a hash table to look up the position of elements in the underlying data structure, or by storing the position of the element in the queue into the element itself. For example, the priority queue constructor might be passed an object implementing the following interface for manipulating the elements in the queue:

ElemOps.java

Binary Heaps

It is straightforward to implement priority queues with ordered or unordered lists. Ordered lists allow constant-time extractMin and linear time add, whereas unordered lists allow constant-time add and linear time extractMin.

However, there is a simple concrete data structure called a binary heap that allows both operations to be done in O(log n) time. (In computer science, the term heap is a bit overloaded. Binary heaps should not be confused with memory heaps. A memory heap is a low-level data structure used to keep track of the computer's memory so that the programming language implementation knows where to place objects in memory. This is not how we are using the term here.)

A binary heap is a binary tree satisfying the heap invariant:

(Heap Invariant) Every node n in the tree has the highest priority among all nodes in the subtree rooted at n. Equivalently, the priority of any node is at least as high as the priority of any of its children. Equivalently still, a heap stores its highest priority element at the root, and the left and right subtrees are also both heaps.

Here is an example of a binary heap in which smaller values have higher priority. Note that the root of each subtree contains the highest-priority element in that subtree.

It is possible to manipulate binary heaps as tree structures. However, additional speedup can be achieved if the binary heap satisfies a second invariant:

(Shape Invariant) For every h, if there exists a node n at depth h, then all 2h-1 possible node of depth h–1 exist in the tree, along with every possible node to the left of n of depth h. It follows that if h is the maximum depth of a node in the tree, the leaves of the tree occur only at depths h and h–1.

In fact, the example tree above also satisfies this shape invariant.

The shape invariant makes it possible to represent the binary heap as a resizable array. The elements of the tree are placed in the array row by row from top to bottom, reading each row from left to right and placing the nodes in the array from left to right. The heap structure illustrated above would be represented by the following array of length 9, with array indices shown on the bottom.

The nice thing about this representation is that it is possible to represent the tree structure without pointers. The shape invariant guarantees that the children of the node at index i are found at indices 2i+1 (left) and 2i+2 (right). Conversely, the parent of a node at index i is found at index ⌊(i–1)/2⌋. So we can walk up and down through the tree using simple arithmetic.

Binary heap operations

Add

Adding a new node to the heap is done by adding the element at the end of the array to preserve the shape invariant. However, the heap invariant may not hold, because its priority may be greater than its parent's priority. To restore the heap invariant, we bubble up the element by swapping it with its parent until either it reaches the root or its parent node has higher priority. This requires at most log n swaps, because the tree is balanced. So adds take at most O(log n) time.

In the example above, if we add an element with priority 2, it is first placed at the end of the array, then bubbles up past the 5 and the 3, finally ending up where 3 was:

ExtractMin

The minimum element is always at the root (location 0 in the array). We can extract it, but we need to replace it with something. The last element in the array is a good candidate. We move it to the root of the tree. This reestablishes the shape invariant, but the heap invariant may now be broken. We fix the heap invariant by bubbling the element down (sometimes also called sifting down). The element is compared against its two children. If either child is higher priority, it is swapped with the higher priority child. The process repeats until either the element is higher priority than its children or it becomes a leaf.

Here is what happens with our example heap. We delete the 1 at the root and replace it with 5, which was the last element in the array. Then we bubble the 5 down. We compare it with 2 and 4. It is lower priority than both, so we swap it with the higher priority child, which is 2. (We must swap it with the higher priority child to maintain the heap invariant, because that child will become the parent of the other child.) We then compare with the new children, 6 and 3, and swap with 3, at which point 5 becomes a leaf. The heap invariant is now reestablished.

Again, at most log n swaps were needed.

IncreasePriority

We may wish to change the priority of an element in the queue. Assigning a new priority to an element maintains the shape invariant, but may break the heap invariant. To reestablish the heap invariant, we may need to bubble the element up if we increase the priority (which for a min-queue means decreasing the value) or down if we decrease the priority.

HeapSort

The heapsort algorithm sorts an array by first heapifying it to make it satisfy the heap invariant. Then extractMin() is used repeatedly to read out the elements in increasing order. This can be used to sort an array in O(n log n) time.

Heapifying can be done by bubbling every element down, starting from the last element in the array representation and working backward.

for (i = (n/2)-1; i >= 0; i--) {
  bubble_down(i);
}

The total time required to do this is linear. At most half the elements need to be bubbled down one step, at most a quarter of the elements need to be bubbled down two steps, and so on. So the total work is at most

n/2 + 2n/4 + 3n/8 + 4n/16 + ... + kn/2k + ... = 2n.

Treaps

A treap is a binary search tree that is balanced with high probability. This is achieved by ensuring the tree has exactly the same structure that it would have had if the elements had been inserted in random order. Each node in a treap contains both a key and a randomly chosen priority. The treap satisfies the BST invariant with respect to all of its keys so elements can be found in the treap in (expected) logarithmic time. The treap satisfies the heap invariant with respect to all its priorities. This ensures that the tree structure is exactly what you would get if the elements had been inserted in priority order.

For example, the following is a treap where the keys are the letters and the priorities are the numbers.

Elements are added to the treap in the usual way for binary search trees. However, this may break the heap invariant on priorities. To fix the heap invariant, the element is bubbled up through the treap using tree rotations that swap a node with its parent while preserving the BST invariant. If the node is x and it is the right child of a parent p, the tree rotation is performed by changing pointers so the data structure on the left turns into the data structure on the right.

Note that A, B, and C here represent entire subtrees that are not affected by the rotation except that their parent nodes may change. Note also that the rotation operation is reversible, so that if p has higher priority than x, we can perform the rotation above from right to left.

For example, adding a node D with priority 2 to the treap above results in the following rotations being done to restore the heap invariant:

This is exactly the tree structure we would have gotten if we had inserted all the nodes into a simple BST in the order specified by their priorities.