Today: data structures, continued. Priority queues and heaps. Data structures are basic tools of the trade -- storing and accessing data Want to know the commonly used ones, building blocks of our programs. Generally the most difficult part of designing a data structure is coming up with a good abstraction. Last time we looked at stacks and queues * Storing data in the order you get it This time we're going to do PRIORITY QUEUES, * Each element comes with a number, its PRIORITY * More important things come *first*, - even if they were added later. * use the convention that smaller priority number is more important. We will look at two different implementations, one using lists and the other using heaps. A heap is a tree structure embedded in a vector. For both implementations, we will use structures for the elements stored in a priority queue (define-class <entry> (<object>) (prio <number>) (val <object>)) (define (make-entry <method>) (method ((prio <number>) (val <object>)) (make <entry> prio: prio val: val))) Operations on a priority queue: make-prioq -- return empty structure prioq-insert! pq entry -- Put entry in pq prioq-extract-min! pq --- Remove most urgent entry in pq, and return it ------------------------------------------------------------------------ First, the list implementation: Keep the lists ordered by increasing priority. The minimum (most important) element is always the first item. Use a list with special token on the front (so first element of prioq is actually second element in the list): (define <lprioq> <list>) (define (make-lprioq <method>) (method () (cons 'lprioq '()))) Operations: lprioq-extract-min! simply removes and returns the head of the list Basically (actual code in handout): (define (lprioq-extract-min! <method>) (method ((prioq <lprioq>)) (bind (((data <list>) (tail prioq))) (if data (bind (((v <entry>) (head data))) (set! (tail prioq) (tail data)) v) lprioq-insert! searches for the right place to insert the new element in the ordered list. >>> Leave this on the board for a while ( {2 y} {5 e} {17 h} ) {8 a} Make a new <entry> and insert it in the appropriate place with rebind-tail! Note: the prioq symbol isn't at the beginning of the list for grins. The tail of that cons cell will point to a list which is changed whenever an insertion is done. If we'd just used '(), there wouldn't be a cons cell to change. (A handy "trick"). so a priority queue in this representation is always at least the list of one element (prioq), the actual data is the tail of this. There are three cases (code on Web): 1) The data part (the tail) is null. (set! (tail prioq) (list entry))) (prioq) becomes (prioq {1 x}) 2) It's the smallest element, so add it to front of data. (set! (tail prioq) (cons entry data))) (prioq {10 x} {12 y) becomes (prioq {2 a} {10 x} {12 y}) 3) It's in the middle somewhere * Go down the list 'til you find the right place - You know it's right when the entry's priority is <= the item's. * Splice it into the list ( before . | ) | +----> ( after . tail ) ===> ( before . | ) | | ( after . tail ) | / V / ( new . / ) Extraction is fast -- O(1) Insertion is potentiall slow -- O(n) -- because you might have to search the whole list. >>> Warning! * Mutators are provided because they can be used to make things fast. * But just because something uses mutators doesn't make it fast!! * It might be dangerous *and* slow. You could give a non-mutator implementation of prioq's that looks basically the same with the same order running times, by copying the list structure, BUT that is is storage inefficient -- lots of extra CONSing. ---------------------------------------------------------------------- We can use a prioq to sort n numbers * Insert them in the queue, with the number as the priority and data both * Then take them out in priority (= numerical) order. >> Code is in the handout Time: O(n) insertions, taking O(n) each, for O(n^2) O(n) deletions, taking O(1) each. Total: O(n^2) ---------------------------------------------------------------------- This is more expensive than it needs to be. We can implement them more efficiently in a HEAP: * A `partially ordered tree' - Each node is no larger than its children - So the smallest node is on top of the tree. <<< Let's ignore data for a bit. Numbers are just priorities >>> 3 / \ / \ 5 9 / \ / \ 12 6 10 15 Heaps are easily represented as *vectors* - 1-dim arrays ---------------------------------------------------------------------- Detour; here is how vectors work in Dylan Type: <vector> Opserations: (n-vector n) --- creates a vector with space for n things, indices 0 to n-1 (index vec i) -- get i'th element of vec, in O(1) time (index-setter! i vec val) -- put val at element i of vec, in O(1) time (define (x <vector>) (n-vector 4)) x ==> [#f #f #f #f] (index-setter! 3 x 0) x ==> [#f #f #f 0] ---------------------------------------------------------------------- Back to heaps: The root of the tree is at location 1 in the vector and the children of the node stored array at position i are at locations 2i and 2i+1. [3 5 9 12 6 10 15] Read across the tree, row by row. Partial Ordering Property for heaps A[i] <= A[2i] and A[i] <= A[2i+1] for 1 <= i <= floor(n/2) We'll make our heaps with a limited size, max-size, telling how many elements the prioq can hold at once. A prioq will be stored in a fixed size vector, allowing some maximum number of elemets in the prioq, because it's expensive to change the size of a vector. So we'll need one cell extra to tell how much of the vector is being used. * Index 0 isn't being used for anything, so let's use that. * call it `last-used' (define <vprioq> <vector>) (define (make-vprioq <method>) (method ((size <integer>)) (bind (((data <vector>) (n-vector (+ 1 size)))) (index-setter! 0 data 0) data))) prioq-insert! takes some work: Put the element at a *leaf* Switch it with its parent, if it's parent is larger, etc 3 / \ / \ 5 9 [3 5 9 12 6 15 10 4] / \ / \ 12 6 10 15 / 4 3 / \ / \ 5 9 [3 5 9 4 6 15 10 12] / \ / \ 4 6 10 15 / 12 3 / \ / \ 4 9 [3 4 9 5 6 15 10 12] / \ / \ 5 6 10 15 / 12 This operation requires only O(log n) time -- the tree is depth ceil(log n), and we do a bounded amount of work on each level. NOTE: the tree is balanced -- cant get all right or left branching -- can see this from the array embedding, where children at 2i and 2i+1, so fill the whole vector left-to-right. * Finding your parent is easy: If you're node i>1, then your parent is floor(i/2) = (quotient i 2) So the code on the handout does the following: * Check for full queue -- last-used = vector-length * Increment last-used * Store new element there * Bubble it up 'til (prio parent) <= (prio child) extract-min! works by returning the element at the root. * Guaranteed to be the most important (smallest value) by the partial ordering property. * Now we have the two subtrees to put right, though. Trick is, * Copy a leaf (last element) to the root (first element) * If it's larger (less important) than one of the children, bubble it down. - Swap with the more important child, to make sure the parent is always more important than both children. Here's what the code does (see handout): * Save minimum element, it's the return value * put last element to first position, * decrement last element counter * bubble the new top down the tree 'til it stops. original heap, to delete top element from (leaves two subheaps) 3 / \ / \ 4 9 [3 4 9 5 6 15 10 12] / \ / \ 5 6 10 15 / 12 copy last leaf to root 12 / \ / \ 4 9 [12 4 9 5 6 15 10] / \ / \ 5 6 10 15 "push down" 4 / \ / \ 12 9 [4 12 9 5 6 15 10] / \ / \ 5 6 10 15 4 / \ / \ 5 9 [4 5 9 12 6 15 10] / \ / \ 12 6 10 15 Again an O(log n) operation. We can sort using this implementation of priority queues. How expensive is the sorting function built from this? n insertions, at O(log n) cost, for O(n log n) total n deletions, at O(log n) cost, for O(n log n) total. Thus, O(n log n) total cost. It's called HEAPSORT and it's one standard one. If you have to sort by doing comparisons only, this is as fast as possible (up to a constant factor). * There are plenty of other O(n log n) algorithms with different properties - smaller constant factor - very fast if the list is already sorted Some special cases will let you sort in O(n) time, but they're rare (can anyone tell me one?) ---------------------------------------------------------------------- Today's concepts: * Priority queue. - insert, - extract-min * Heap - partially ordered tree - vector rep * vectors * Heapsort: O(n log n)