CS212 Section 04/01/98

Today: data structures, continued.
  Priority queues and heaps.

Data structures are basic tools of the trade  -- storing and accessing
data

Want to know the commonly used ones, building blocks of our programs.

Generally the most difficult part of designing a data structure is coming
up with a good abstraction.

Last time we looked at stacks and queues
  * Storing data in the order you get it

This time we're going to do PRIORITY QUEUES, 
  * Each element comes with a number, its PRIORITY
  * More important things come *first*, 
    - even if they were added later.
  * use the convention that smaller priority number is more important.

We will look at two different implementations, one using lists and the
other using heaps.  A heap is a tree structure embedded in a vector.

For both implementations, we will use structures for the elements
stored in a priority queue

(define-class <entry> (<object>)
  (prio <number>)
  (val <object>))

(define (make-entry <method>)
  (method ((prio <number>) (val <object>))
    (make <entry> prio: prio val: val)))

Operations on a priority queue:
  make-prioq -- return empty structure
  prioq-insert! pq entry -- Put entry in pq
  prioq-extract-min! pq --- Remove most urgent entry in pq, 
                              and return it

------------------------------------------------------------------------


First, the list implementation: Keep the lists ordered by
increasing priority.

The minimum (most important) element is always the first item.

Use a list with special token on the front (so first element of prioq
is actually second element in the list):

(define <lprioq> <list>)

(define (make-lprioq <method>)
  (method ()
    (cons 'lprioq '())))

Operations:

  lprioq-extract-min! simply removes and returns the head of the list

   Basically (actual code in handout):
   
(define (lprioq-extract-min! <method>)
  (method ((prioq <lprioq>))
    (bind (((data <list>) (tail prioq)))
      (if data
          (bind (((v <entry>) (head data)))
            (set! (tail prioq) (tail data))
            v)

  lprioq-insert! searches for the right place to insert the
                 new element in the ordered list.


>>> Leave this on the board for a while 

  ( {2  y} {5  e}         {17 h} )
                    {8 a}

      Make a new <entry> and insert it in the appropriate place
      with rebind-tail!

Note: the prioq symbol isn't at the beginning of the list for
grins. The tail of that cons cell will point to a list which is
changed whenever an insertion is done.  If we'd just used '(), there
wouldn't be a cons cell to change.  (A handy "trick").

so a priority queue in this representation is always at least the list
of one element (prioq), the actual data is the tail of this.
      
There are three cases (code on Web):

1)  The data part (the tail) is null.

  (set! (tail prioq) (list entry)))

  (prioq)

  becomes
 
  (prioq {1  x})

2) It's the smallest element, so add it to front of data.

  (set! (tail prioq) (cons entry data)))

   (prioq {10 x} {12 y)

   becomes

   (prioq {2 a} {10 x} {12 y})


3) It's in the middle somewhere

   * Go down the list 'til you find the right place
     - You know it's right when the entry's priority is <= the item's.
   * Splice it into the list

      ( before . | )
                 |
                 +----> ( after . tail )

     ===> 


      ( before . | )
                 |
                 |      ( after . tail )
                 |      /
                 V     /
              ( new . / )


Extraction is fast -- O(1)

Insertion is potentiall slow -- O(n) -- because you might
   have to search the whole list.

>>> Warning!  
  * Mutators are provided because they can be used to make things
    fast.
  * But just because something uses mutators doesn't make it fast!!
  * It might be dangerous *and* slow.

You could give a non-mutator implementation of prioq's that looks
basically the same with the same order running times, by copying the
list structure, BUT that is is storage inefficient -- lots of extra
CONSing.


----------------------------------------------------------------------


We can use a prioq to sort n numbers
  * Insert them in the queue, with the number as the priority and data both
  * Then take them out in priority (= numerical) order.
    
>> Code is in the handout

  Time:  O(n) insertions, taking O(n) each, for O(n^2)
         O(n) deletions, taking O(1) each.
  Total: O(n^2)


----------------------------------------------------------------------

This is more expensive than it needs to be.  
We can implement them more efficiently in a HEAP:
  * A `partially ordered tree'
    - Each node is no larger than its children
    - So the smallest node is on top of the tree.

<<< Let's ignore data for a bit.  Numbers are just priorities >>>

              3
             / \
            /   \
           5     9
          / \   / \
         12  6 10  15

Heaps are easily represented as *vectors* 
  - 1-dim arrays 

----------------------------------------------------------------------

Detour; here is how vectors work in Dylan

Type: <vector>
Opserations:

  (n-vector n) --- creates a vector with space for n things, indices 0 to n-1
  (index vec i) -- get i'th element of vec, in O(1) time
  (index-setter! i vec val) -- put val at element i of vec, in O(1) time

(define (x <vector>) (n-vector 4))

x ==> [#f #f #f #f]

(index-setter! 3 x 0)

x ==> [#f #f #f 0]

----------------------------------------------------------------------

Back to heaps:

The root of the tree is at location 1 in the vector and the
children of the node stored array at position i are at
locations 2i and 2i+1.

[3 5 9 12 6 10 15]

Read across the tree, row by row.  

Partial Ordering Property for heaps
  A[i] <= A[2i]  and A[i] <= A[2i+1]
for 1 <= i <= floor(n/2)

We'll make our heaps with a limited size, max-size, telling how many
elements the prioq can hold at once.

A prioq will be stored in a fixed size vector, allowing some maximum number of
elemets in the prioq, because it's expensive to change the size of a vector.
So we'll need one cell extra to tell how much of the vector is being used.
  * Index 0 isn't being used for anything, so let's use that.
  * call it `last-used'

(define <vprioq> <vector>)

(define (make-vprioq <method>)
  (method ((size <integer>))
    (bind (((data <vector>) (n-vector (+ 1 size))))
      (index-setter! 0 data 0)
      data)))

prioq-insert! takes some work:

  Put the element at a *leaf*
  Switch it with its parent, if it's parent is larger, etc


              3
             / \
            /   \
           5     9        [3 5 9 12 6 15 10 4]
          / \   / \
         12  6 10  15
        /
       4


              3
             / \
            /   \
           5     9        [3 5 9 4 6 15 10 12]
          / \   / \
         4   6 10  15
        /
       12


              3
             / \
            /   \
           4     9        [3 4 9 5 6 15 10 12]
          / \   / \
         5   6 10  15
        /
       12

This operation requires only O(log n) time -- the tree is depth
ceil(log n), and we do a bounded amount of work on each level.  NOTE:
the tree is balanced -- cant get all right or left branching -- can
see this from the array embedding, where children at 2i and 2i+1, so
fill the whole vector left-to-right.

  * Finding your parent is easy:
    If you're node i>1, then your parent is floor(i/2) = (quotient i 2)

So the code on the handout does the following:
  * Check for full queue -- last-used = vector-length
  * Increment last-used 
  * Store new element there
  * Bubble it up 'til (prio parent) <= (prio child)
    

extract-min! works by returning the element at the root.
  * Guaranteed to be the most important (smallest value) by the
    partial ordering property.
  * Now we have the two subtrees to put right, though.

Trick is, 
  * Copy a leaf (last element) to the root (first element)
  * If it's larger (less important) than one of the children, bubble it down. 
    - Swap with the more important child, to make sure the parent is always
      more important than both children.

Here's what the code does (see handout):
  * Save minimum element, it's the return value
  * put last element to first position, 
  * decrement last element counter
  * bubble the new top down the tree 'til it stops.


original heap, to delete top element from (leaves two subheaps)


              3
             / \
            /   \
           4     9        [3 4 9 5 6 15 10 12]
          / \   / \
         5   6 10  15
        /
       12


copy last leaf to root

              12
             / \
            /   \
           4     9        [12 4 9 5 6 15 10]
          / \   / \
         5   6 10  15


"push down"

              4
             / \
            /   \
           12     9        [4 12 9 5 6 15 10]
          / \   / \
         5   6 10  15


              4
             / \
            /   \
           5     9        [4 5 9 12 6 15 10]
          / \   / \
         12  6 10  15

Again an O(log n) operation.

We can sort using this implementation of priority queues.
How expensive is the sorting function built from this?

  n insertions, at O(log n) cost, for O(n log n) total
  n deletions, at O(log n) cost, for O(n log n) total.

  Thus, O(n log n) total cost.

It's called HEAPSORT and it's one standard one.

If you have to sort by doing comparisons only, this is as fast as
possible (up to a constant factor).
  * There are plenty of other O(n log n) algorithms with different
    properties
    - smaller constant factor
    - very fast if the list is already sorted 

Some special cases will let you sort in O(n) time, but they're rare (can
anyone tell me one?)

----------------------------------------------------------------------

Today's concepts:
 
  * Priority queue.  
    - insert, 
    - extract-min
  * Heap
    - partially ordered tree
    - vector rep
  * vectors
  * Heapsort: O(n log n)