Today: data structures, continued.
Priority queues and heaps.
Data structures are basic tools of the trade -- storing and accessing data
Want to know the commonly used ones, building blocks of our programs.
Generally the most difficult part of designing a data structure is coming up with a good abstraction.
Last week we looked at stacks and queues
[cafeteria trays = stack, cafeteria line = queue]
This time we're going to do PRIORITY QUEUES [military cafeteria]
* Each element comes with a number, its PRIORITY
* More important things come first,
* use the convention that smaller priority number is more important.
We will look at two different implementations, one using lists and the other using heaps. A heap is a tree structure embedded in a vector.
For both implementations, we will use structures for the elements stored in a priority queue
(defclass <entry> () (prio :type <number>) (val :type <object>)) (define make-entry (lambda (prio val) (make <entry> :prio prio :val val)))
Operations on a priority queue:
make-prioq -- return empty structure
prioq-insert! pq entry -- Put entry in pq
prioq-extract-min! pq --- Remove most urgent entry in pq, and return it
Note that the operations are defined to use side-effects (why?)
First, the list implementation: Keep the lists ordered by increasing priority.
The minimum (most important) element is always the first item.
Use a list with special token on the front (so first element of prioq is actually second element in the list):
(define make-lprioq
(lambda ()
(cons 'lprioq '())))
lprioq-extract-min! simply removes and returns the head of the list
(define lprioq-extract-min! (lambda (prioq) (let ((data (tail prioq))) (if data (let ((v (head data))) (set! (tail prioq) (tail data)) v) (error "Cannot extract min from empty prioq")))))
lprioq-insert! searches for the right place to insert the new element in the ordered list.
>>> Leave this on the board for a while
( {2 y} {5 e} {17 h} )
{8 a}
Make a new <entry> and insert it in the appropriate place with set-cdr!
Note: the prioq symbol isn't at the beginning of the list for grins. The tail of that cons cell will point to a list which is changed whenever an insertion is done. If we'd just used '(), there wouldn't be a cons cell to change. (A handy "trick").
so a priority queue in this representation is always at least the list of one element (prioq), the actual data is the tail of this.
There are three cases:
1) The data part (the tail) is null.
(set! (tail prioq) (list entry)))
(prioq {1 x})
2) It's the smallest element, so add it to front of data.
(set! (tail prioq) (cons entry data)))
(prioq {10 x} {12 y)
(prioq {2 a} {10 x} {12 y})
3) It's in the middle somewhere
* Go down the list 'til you find the right place
- You know it's right when the entry's priority is <= the item's.
* Splice it into the list
( before . | )
+----> ( after . tail )
( before . | )
| ( after . tail )
| /
V /
( new . / )
Extraction is fast -- O(1)
Insertion is potentially slow -- O(n) -- because you might have to search the whole list.
>>> Warning!
You could give a non-mutator implementation of prioq's that looks basically the same with the same order running times, by copying the list structure, BUT that is storage inefficient -- lots of extra CONSing. Also, you need to change the delete operation’s semantics a bit (currently it changes the prioq and returns the deleted element)
We can use a prioq to sort n numbers
* Insert them in the queue, with the number as the priority and data both
* Then take them out in priority (= numerical) order.
Time: O(n) insertions, taking O(n) each, for O(n^2)
O(n) deletions, taking O(1) each.
Total: O(n^2)
This is more expensive than it needs to be.
We can implement them more efficiently in a HEAP:
* A `partially ordered tree'
- Each node is no larger than its children
- So the smallest node is on top of the tree.
<<< Let's ignore data for a bit. Numbers are just priorities >>>
/ \
/ \
5 9
/ \ / \
12 6 10 15
Heaps are easily represented as *vectors*
- 1-dim arrays
Detour; here is how vectors work in Scheme
Type: <vector>
(make-vector n init) --- creates a vector with space for n things, indices 0 to n-1, values are init (or undefined, if init not given)
(vector-ref vec i) -- get i'th element of vec, in O(1) time
(vector-set! vec i val) -- put val at element i of vec, in O(1) time
(define x (make-vector 4 #f))
x ==> [#f #f #f #f]
(vector-set! x 3 0)
x ==> [#f #f #f 0]
How do these things work so fast? Real story is in CS314. Short version: this is actually what the hardware does (pretty much exclusively...)
Back to heaps:
The root of the tree is at location 1 in the vector and the children of the node stored array at position i are at locations 2i and 2i+1.
[3 5 9 12 6 10 15]
Read across the tree, row by row.
Partial Ordering Property for heaps
A[i] <= A[2i] and A[i] <= A[2i+1]
for 1 <= i <= floor(n/2)
We'll make our heaps with a limited size, max-size, telling how many elements the prioq can hold at once.
A prioq will be stored in a fixed size vector, allowing some maximum number of elements in the prioq, because it's expensive to change the size of a vector.
So we'll need one cell extra to tell how much of the vector is being used.
(define make-vprioq (lambda (size) (let ((data (make-vector (+ 1 size)))) (vector-set! data 0 0) data)))
prioq-insert! takes some work:
Put the element at a *leaf*
Switch it with its parent, if its parent is larger [should be below it]
Repeat until done
/ \
/ \
5 9
/ \ / \
12 6 10 15
[3 5 9 12 6 10 15 4]
/ \
/ \
5 9
/ \ / \
4 6 10 15
[3 5 9 4 6 10 15 12]
/ \
/ \
4 9
/ \ / \
5 6 10 15
[3 4 9 5 6 10 15 12]
This operation requires only O(log n) time -- the tree is depth ceil(log n), and we do a bounded amount of work on each level.
NOTE: the tree is balanced -- can’t get all right or left branching -- can see this from the array embedding, where children at 2i and 2i+1, so fill the whole vector left-to-right.
* Finding your parent is easy:
If you're node i>1, then your parent is floor(i/2) = (quotient i 2)
So the code does the following:
* Check for full queue -- last-used = vector-length
* Increment last-used
* Store new element there
* Bubble it up 'til (prio parent) <= (prio child)
extract-min! works by returning the element at the root.
* Guaranteed to be the most important (smallest value) by the partial ordering property.
* Now we have the two subtrees to put right, though.
Trick is,
* Copy a leaf (last element) to the root (first element)
* If it's larger (less important) than one of the children, bubble it down.
- Swap with the more important child, to make sure the parent is always more important than both children.
Here's what the code does:
* Save minimum element, it's the return value
* put last element to first position,
* decrement last element counter
* bubble the new top down the tree 'til it stops.
original heap, to delete top element from (leaves two subheaps)
/ \
/ \
4 9
/ \ / \
5 6 10 15
[3 4 9 5 6 10 15 12]
copy last leaf to root
/ \
/ \
4 9
/ \ / \
5 6 10 15
[12 4 9 5 6 10 15]
"push down"
/ \
/ \
12 9
/ \ / \
5 6 10 15
[4 12 9 5 6 10 15]
/ \
/ \
5 9
/ \ / \
12 6 10 15
[4 5 9 12 6 10 15]
Again an O(log n) operation.
We can sort using this implementation of priority queues.
How expensive is the sorting function built from this?
n insertions, at O(log n) cost, for O(n log n) total
n deletions, at O(log n) cost, for O(n log n) total.
Thus, O(n log n) total cost.
It's called HEAPSORT and it's one standard one.
If you have to sort by doing comparisons only, this is as fast as possible (up to a constant factor).
* There are plenty of other O(n log n) algorithms with different properties
- smaller constant factor
- very fast if the list is already sorted
Some special cases will let you sort in O(n) time, but they're rare (can anyone tell me one?)
Today's concepts:
* Priority queue.
- insert,
- extract-min
* Heap
- partially ordered tree
- vector rep
* vectors
* Heapsort: O(n log n)