Let's look at another way to implement ordered sets. Here is an ordered
set signature that is designed to support implementation of both set and map
abstractions. We've added some operations to show the added power of ordered sets. The
first
function gives the first element in the set, and fold_forward
iterates
over the elements of the set in ascending order. We can similarly implement last
and fold_backward
from the set signature.
We have already seen red-black trees, which are one good way to implement ordered sets. Red-black trees are nice because they guarantee O(lg n) insert, lookup, and deletion time, with good constant factors. However, if we are willing to accept probabilistic assurances of performance there are other, simpler options for implementing ordered sets. Two well-known data structures for implementing ordered sets use randomness to achieve good average-case performance: skip lists and treaps. Treaps are simpler and probably faster.
The idea behind treaps is to use randomness to produce binary search trees that are usually balanced. As you know, binary search trees are effective if they are balanced. But if the elements of the tree are inserted in an ordered way, the tree can end up being a linked list (or at least extremely unbalanced), leading to O(n) performance. On the other hand, if a set of elements is inserted in a random order, the expected distance in the tree to a randomly chosen element is O(lg n). To see why, imagine walking down the tree from the root to a leaf. At any given point on the walk, there is a subtree of (say) n elements below the current element. Suppose that we construct a sequence of all of the n elements in this subtree in key order. Because the elements were inserted in random order, the element at the current node is randomly positioned at some position p within the ordered sequence, where p goes from 1 to n. If we are looking for a randomly chosen element, then there is a 1/n probability that the current element is the one of interest. The left subtree contains p−1 elements, so there is a (p−1)/n probability that the element of interest. Correspondingly, the right subtree contains (n−p) elements, and there is an (n−p)/n probability that the element is there. The expected size of the subtree that is visited after one step of the walk assuming position p is therefore (p−1)·(p−1)/n + (n−p)·(n−p)/n + (1·1/n). All values of p from 1 to n are equally likely, so the expected size of the next subtree is therefore the sum of this expression for all p from 1 to n, divided by n:
As this shows, each branch taken shrinks the size of the subtree below the current node by a factor of approximately 2/3. Therefore we expect to take O(lg n) steps to walk to a randomly chosen element: more precisely, about log3/2 n steps on average.
Treaps simulate the construction of a randomly constructed binary search tree. Each node in a treap contains not only a value and pointers to the left and right children, but also a priority. The nodes of the treap satisfy the heap ordering invariant with respect to this priority. The idea is that a treap always looks like the binary search tree you would get if you had inserted the elements in priority order. If the priorities are generated randomly, you have a random treap whose structure is the same as the corresponding random binary search tree. In an ordinary binary search tree, elements inserted later are always lower in the tree; therefore, the nodes in a treap must satisfy the heap ordering invariant on the node priorities. A treap is both a binary search tree with respect to the node elements, and a heap with respect to the node priorities. From this comes its name: "treap" = "tree heap".
Given a set of elements and associated priorities it is not completely obvious that we can construct a treap that satisfies both invariants simultaneously. Clearly the root of the treap must be the node with highest priority. To satisfy the BST invariant, all the nodes whose keys are less than this node must be in the left subtree of this node, and the nodes whose keys are greater must be in the right subtree. Therefore, we can apply this tree construction recursively to the left and right subtrees, resulting in a treap.
Given an existing treap, how do we insert a new element? The algorithm follows the same strategy as in red-black trees: it finds the unique leaf where the element can be inserted while preserving the BST invariant. However, we also assign this element a random priority. The final treap had better look like the binary search tree that one would get if the newly inserted element had been inserted according to its priority. This is achieved by performing a series of tree rotations to enforce the heap ordering invariant.
A simple tree rotation is also useful to know about for other tree algorithms such as splay trees and AVL trees. Notice that the following two trees both satisfy the binary search tree invariant, and that all of the elements remain in the same order with respect to an in-order traversal, regardless of the structure of the subtrees A, B, C:
x y / \ / \ A y x C / \ / \ B C A B
A tree rotation converts a part of the tree that looks like one of these into the other. The advantage is that the relative position of x and y is swapped by the rotation. Thus, if y is higher priority than x but it is below x, thus breaking the heap-ordering invariant (as in the left-hand picture), a tree rotation to the right-hand configuration will restore the heap-ordering invariant because it puts x below y.
Suppose we want to insert the elements 1,2,3,4,5 into a treap. With an ordinary BST this would result in a very unbalanced tree. Suppose, however, that the elements receive the priorities 17,30,25,33,11 (probably the priorities would range over all integers in a realistic implementation.) The tree evolves as follows:
(1,17) (1,17) (1,17) \ \ (2,30) (2,30) \ (3,47)
So far no tree rotations have been necessary to enforce the heap ordering invariant. However, this will not be true on the next insertion because it has a higher priority:
(1,17) (1,17) \ \ (2,30) (2,30) \ => \ (3,47) (4,33) \ / (4,33) (3,47)
The final insertion will rotate the value all the way to the top because it has the highest priority:
(1,17) (1,17) (1,17) (5,11) \ \ \ / (2,30) (2,30) (5,11) (1,17) \ => \ => / => \ (4,33) (5,11) (2,30) (2,30) / \ / \ \ (3,47) (5,11) (4,33) (4,33) (4,33) / / / (3,47) (3,47) (3,47)
Of course, this particular tree doesn't look very balanced, but that is just an artifact of the priorities we used in the example. Typically the tree will be more balanced.
Here is code that implements treaps:
functor Treap(structure Params: ORDERED_SET_PARAMS) = struct type key = Params.key type elem = Params.elem val compare = Params.compare val keyOf = Params.keyOf type prio = Rand.rand datatype tree = Empty | Node of {left: tree, right: tree, value: elem, priority: prio} type node = {left: tree, right: tree, value: elem, priority: prio} (* Rep Invariant: * For Node{value,priority,left,right}: * 0. Binary Search Tree: all of the values in the tree "left" have * keys are less than the key of "value", and all * of the values in "right" have keys greater than the key of * "value". * 1. Heap ordering: all of the priorities in the left and right * subtrees are at least as large as "priority". *) fun lookup(t:tree,k:key): elem option = case t of Empty => NONE | Node {value,priority,left,right} => (case compare (k, keyOf(value)) of EQUAL => SOME value | LESS => lookup(left, k) | GREATER => lookup(right, k)) fun add(t:tree, e: elem, p: prio): tree * bool = let (* Given a < xv < b < yv < c, heap_rotate(xv,xp,yv,yp,a,b,c) is * a node for a tree that satisfies the rep invariant and contains * all of the elements in question. *) fun heap_rotate(xv,xp, yv,yp, a: tree, b: tree, c: tree): node = if xp < yp then {value = xv, priority = xp, left = a, right = Node{value = yv, priority = yp, left = b, right = c}} else {value = yv, priority = yp, right = c, left = Node{value = xv, priority = xp, left = a, right = b}} fun add_node(t: tree, e: elem, p:prio): node * bool = case t of Empty => ({value=e, priority=p, left=Empty, right=Empty}, false) | Node{value, priority, left, right} => case compare(keyOf(e),keyOf(value)) of EQUAL => ({value=e, priority=priority, left=left, right=right}, true) | LESS => let val ({value=xv, priority=xp, left=a, right=b}, dup) = add_node(left, e, p) in (heap_rotate(xv, xp, value, priority, a, b, right), dup) end | GREATER => let val ({value=yv, priority=yp, left=b, right=c}, dup) = add_node(right, e, p) in (heap_rotate(value, priority, yv, yp, left, b, c), dup) end val (n, dup) = add_node(t,e,p) in (Node(n), dup) end fun first(t: tree): elem option = case t of Empty => NONE | Node{value, priority, left, right} => case first(left) of NONE => SOME value | eo => eo fun fold_forward(f: elem*'b->'b) (b:'b) (k:key) (t:tree) = case t of Empty => b | Node {value,priority,left,right} => (case compare(keyOf(value), k) of EQUAL => fold_forward f (f(value,b)) k right | LESS => fold_forward f b k right | GREATER => let val lft = fold_forward f b k left in fold_forward f (f(value,lft)) k right end) end
Here, heap_rotate
is the function that figures out which of the two tree
configurations above is appropriate, given two elements x
and y
and their
associated priorities. This code doesn't actually build the tree nodes for the
result until it has to, resulting in some performance improvement. The function
add_node walks to the bottom of the tree, then uses heap_rotate
as it
reconstructs the tree on the way back up so that the heap ordering invariant is
always maintained. Note that first
and fold_forward
work exactly the same way for
all binary trees.
This code assumes that a priority is provided when elements are added to the data structure. We want this priority to be randomly chosen from a large space so that the tree is likely to be approximately balanced. SML provides some library functions for generating pseudo-random numbers. For this use, it doesn't matter too much how good the pseudo-random number generator is. Here is how we can use a random number generator to produce random treaps, a good set implementation. We haven't implemented remove here, but it's done using rotations too.
functor TreapSet(structure Params: ORDERED_SET_PARAMS) :> ORDERED_FUNCTIONAL_SET where type key = Params.key and type elem = Params.elem = struct type key = Params.key type elem = Params.elem val compare = Params.compare val keyOf = Params.keyOf structure T = Treap(structure Params = Params) type set = {tree: T.tree, seed: Rand.rand, size: int} fun empty() = {tree = T.Empty, seed = 0wx5a5a5, size = 0} fun lookup({tree,seed,size}, k) = T.lookup(tree,k) fun add({tree,seed,size}, e:elem) = let val p = Rand.random(seed) val (t',dup) = T.add(tree,e,p) val size' = if dup then size else size+1 in ({tree=t', seed=p, size=size'}, dup) end fun size({tree,seed,size}) = size fun first({tree,seed,size}) = T.first(tree) fun remove(t,k) = raise Fail "Not implemented: treap remove" fun last(t) = raise Fail "Not implemented: last" type 'b folder = ((elem*'b)->'b) -> 'b -> key -> set -> 'b fun fold_forward (f: elem*'b->'b) (b:'b) (k:key) {tree,seed,size} = T.fold_forward f b k tree fun fold_backward f b k tr = (raise Fail "Not implemented: fold backward") end
The win of treaps is that the code is considerably simpler than red-black trees. Red-black trees are known for being fast, but this implementation of treaps is competitive in speed and a lot shorter and simpler.