CS212 Lecture: Streams

New topic area today.  
  Streams:
    * Another way of thinking about state

For the last month or so we've been working with local state
  * procedural abstraction and local state variables
    > a bank account with hidden balance variable
  * Object-oriented programming
    > instance variables of objects
  * Mutable data structures
    > queues and heaps and such

These all have the feel of ``real life'':
  * The state of the moment is visible.
  * Past states are just memories
  * Future states are inaccessable.

For example, we'd simulate

(A) A physical object with some state
    >>> draw a spaceship <<<

as a
(B) A computational object with some state
    >>> (make-spaceship) <<<

Changes in (A) are reflected in (B) and vice versa.


----------------------------------------------------------------------


Today, we'll look at a different view of state:
  * More like a book (or cd-rom)
    - Watch the state in terms of its evolution across time.
    - The whole thing's there if we want to skim forward or backward.


----------------------------------------------------------------------


This view of programming is rather like signal processing.
  * Watch the signal / data flow through a processor/program.

             +--------------+   +-------------+   +----------+ 
{laser       |              |   |             |   |          | 
 light} -->  | CD player    |-->| Preamp      |-->| Amp      |-->sound
             |              |   |             |   |          | 
             +--------------+   +-------------+   +----------+
             
You can think of each of those boxes as something that takes an entire
72-minute signal, and outputs a different 72-minute signal that's
closer to music.  (Or you can look at what is there at each point in time.)

>>> Draw some curves at each of the arrows, some notes at the sound


We're going to do something similar, having information flowing
through a collection of boxes.

We'll look at primitives (kinds of boxes)
  * enumerate [generate a signal]
  * map [turn one signal into another]
  * filter [remove some signal values]
  * accumulate [turn a signal into a scalar]

Idea:
  * Stereos and sound-systems are great because they come in lots of
    boxes and you can hook them together in many many useful ways.
  * We'll try to do the same thing with boxes for streams.

We've looked at some of these (e.g., map) for lists.  Streams will be similar.
  * Initially, they'll just *be* lists
  * But next time we'll let them be infinitely long.
    - We'll have a stream of all the integers.
    - That couldn't possibly fit as a list.

----------------------------------------------------------------------


First, let's look at a pretty standard piece of Scheme code:

  * Sum the squares of the prime integers between 1 and N:

(define sum-prime-squares
  (method ((n <integer>))
    (bind-methods ((next ((k <integer>))
                  (cond
                    ((> k n) 0)
                    ((prime? k) (+ (square k) (next (+ 1 k))))
                    (else: (next (+ 1 k))))))
      (next 1))))

There are four things going on here:
  1. ENUMERATE      the numbers 1 ... n
  2. FILTER	    out the prime numbers from that list
  3. MAP	    square on each of the selected numbers
  4. ACCUMULATE     the result, using +, starting from 0.

Note that filter/map/accumulate are already defined on lists.
We'll make them generics to handle streams.

      +-----+     +-----+      +-----+      +-----+      
n --> |ENUM | --> |FILT | -->  |MAP  | -->  |ACCUM| --> sum
      +-----+     +-----+      +-----+      +-----+         

This pattern is pretty hard to see from the code, though:
  * Everything's going on at once.

We're going to use STREAMS to capture this picture.

FOR NOW -- and this is *WRONG* but a good start --
  think of a stream as a list.

[Another of those 212 "white lies"...]

Here are the operations on STREAMS:

  (cons-stream thing stream)
  (empty-stream? stream)  -- true only of empty-stream
  (heads stream)
  (tails stream)

Contract:
  (heads (cons-stream thing stream)) ==> thing
  (tails (cons-stream thing stream)) ==> stream

Note that so far it looks just like lists.

Now, we can implement our procedure as a bunch of little boxes.  

Here's a stream made from scratch:

>>> Save this to the end of class <<<

(define enumerate-interval
  (method ((low <integer>) (high <integer>))
    (if (> low high)
        empty-stream
        (cons-stream low
          (enumerate-interval (+ low 1) high)))))


We'd like our program to look like that chain of black boxes.  
We'll write it:

(define sum-prime-squares
  (method ((n <integer>))
    (accumulates + 0
                (maps square
                     (filters prime?
                            (enumerate-interval 1 n))))))


A black box is then going to be a Scheme function
  * One input, viz. the stream from the left.
  * One output, the stream going right.
    - For all but the last box, it's going to be a stream.

---------------------------------------------------------------------



Now we need to define each of the boxes.  They will be just like
their list equivalents, except they use <stream>, heads, tails and
cons-stream instead of <list>, head, tail and cons.

>>> Save filter until end of class <<<

(define filters
  (method ((pred <function>) (stream <stream>))
    (cond ((empty-stream? stream) stream)
          ((pred (heads stream))
           (cons-stream (heads stream)
                        (filters pred (tails stream))))
          (else: (filters pred (tails stream))))))

So (filters odd? s) returns a stream of all the odd numbers in
stream s (if any)

(define maps
  (method ((f <function>) (stream <stream>))
    (if (empty-stream? stream) 
        stream
        (cons-stream (heads stream) (maps pred (tails stream))))))

(map square s) returns a stream of all the elements of s squared

(define accumulates
  (method ((combiner <function>) (init <object>) (stream <stream>))
    (if (empty-stream? stream)
        init
        (combiner (heads stream)
                  (accumulates combiner init (tails stream))))))

Now sum-prime-squares works as diagrammed in the streams diagram.

----------------------------------------------------------------------


So just looks like we are re-writing a bunch of code that we already
knew how to write another way.  Have this different view of ``boxes''
processing the data. Why are we bothering with this?
  1.  It's a very powerful metaphor.
      * 60% of scientific FORTRAN code fits into this view.
      * the famous and popular Unix system gets major mileage out of
        such a view (pipes)
  2. Once you've got a decent library of stream functions, 
        you can throw together fancy programs quite fast.
      * That's Unix's major win.
  3. It will enable us to talk about computations over infinite data
     structures:
      * "the integers", say. (advert. for next lecture)


----------------------------------------------------------------------


But... implementing streams as lists can be massively inefficient.

Suppose I ask,
  "What is the second prime between 10,000 and 93,000,000?"

>> Use this at end of lecture <<

(heads (tails (filter prime? (enumerate-interval 10000 93000000))))

We end up having to
  1. create a list of 92,990,000 integers,
  2. Check them ALL for primality, 
  3. Pick the second one!

That's a pretty impressive waste of work. 

How do we do better?

We use a common and very powerful idea:
  BE LAZY! -- but be lazy in a particular way

Specificially, 
  At selected points in the code, we deliver a promise to do something
  rather than actually doing it.

Maybe nobody will actually collect on it!  Then we don't have to do
the work!

The difference between streams and lists is just this:
  * With a stream, the tail is *not* evaluated when you *MAKE* the
    stream
  * It is only evaluated when you *USE* it.
    tail evaluates tails.  
    cons-stream doesn't

Contrast this with a list.  Same contract, but evaluation happens at
different time, 

  (heads (cons-stream thing stream)) ==> thing  (stream not evaluated here)
  (tails (cons-stream thing stream)) ==> stream  

The tail is a *promise* to evaluate the tail when asked to, not an
actual object.


----------------------------------------------------------------------


We do this by inventing the special form DELAY:
  -- It doesn't follow the usual evaluation rules  (SPECIAL FORM)
  -- We can add it to Dylan easily using macros

(delay stuff) 
  --> delivers a promise to evaluate stuff when it has to.

(force promise)
  --> collects on the promise

(define (x <object>) (delay (/ 1 0)))
  * Not an error,
(force x)
  * gets the division by zero error.

Note that this is a lot like functions (method of no arguments)

(define xx (method () (/ 1 0)))
(xx)

We could define delay this way as a macro, because as we saw macros allow 
the definition of "equivalent forms"

(delay x) is equivalent to (method () x)

(define delay
  (macro (x)
    (list 'method '() x)))

[NOTE: this definition of delay as a macro is just here for completeness,
you are not responsible for knowing how to define macros].

Then force would be defined as a regular function:

(define (force <function>)
  (method ((promise <function>))
    (promise)))

------------------------------------------------------------------------


Now that we have delay and force, back to streams.  cons-stream is a
special form too, because the second argument isn't evaluated.

(cons-stream thing stream) is equivalent to (cons thing (delay stream))

Which can be defined using the following macro:

(define cons-stream
  (macro (t s)
    (list 'cons t (list 'delay s))))

Then,

(define (heads <function>)
  (method ((s <stream>)) (head s)))

(define (tails <function>)
  (method ((s <stream>)) (force (tail s))))

And 
(define <stream> <list>)

This DELAYED EVALUATION gives a demand-driven computation
  -- Do things when you need to 

    +-----+      +-----+      +-----+      
--> |     | -->  |     | -->  |     | -->  
    +-----+      +-----+      +-----+      

It's more like pulling a string through a bunch of holes, than pouring
water through them
  - you can get as much as you want
  - but you don't get any more.

>>> Point to that 93000000 example <<<
(heads (tails (filter prime? (enumerate-interval 10000 93000000))))

The stream (enumerate-interval 10000 93000000) is a cons cell:

  ( 10000 . {promise to (enumerate-interval 10001 93000000)})

Well, 10000 isn't prime, so filter will ask for the next one in the
stream.  >>> Point to filter

and (tails (10000 . {promise to (enumerate-interval 10001 93000000)} )
forces the tail, which does the next enumerate-interval computation

----------------------------------------------------------------------

Note for section: streams have an asymmetry, namely the head is always
forced.  So for example 
(filter (method (x) (> x 1000000)) (enumerate-interval 1 10000000000))
runs for a long time.


----------------------------------------------------------------------


There's one last inefficiency possible:
  * If we're implementing delay as a function of no arguments (as above),
    what if you need the same element multiple times?
  
    Each time it is computed anew.

    Expensive!  

We MEMOIZE it the first time we compute it --- 
  save result, use it later


----------------------------------------------------------------------


STREAMS:
  * Like lists, but they *delay* their tails
    - Only evaluate them when necessary
  * Delay - promise to compute something later
    - When (force x)'ed.

  * Stream operations:
     map
     filter
     etc.

TODAY:  More about streams: 

             Infinite streams
             Pitfalls of delayed evaluation
             Streams and functional vs. imperative programming

Last time:
  * Programming paradigm - streams
  * Same basic contract as a list:
    (heads (cons-stream x str)) = x
    (tails (cons-stream x str)) = str
  * Order of evaluation is DIFFERENT
    (cons-stream x str) --- evaluates x immediately
                                str only when it needs it.
    (tails str) ----- forces evaluation of the tail of the stream.

Delayed Evaluation:
  * Only compute values in the tail when they're needed.
  * Use (delay foo) special form.

    (delay foo) -- makes a promise to compute foo when forced to.
                   MAKE IOU
    (force foo) -- collects on the promise.
                   
Streams gave us a view of a program as SIGNAL PROCESSING: 
  * data going through a chain of boxes


"What is the second prime between 10000 and a zillion?"
                   +--------+    +--------+ 
(10000,zillion) -->|prime?  | -->|second  | ---> answer
                   +--------+    +--------+           


The KEY difference between streems and lists is that with streams the
flow of data is like "pulling a string", only as much computation gets
done as is needed to provide the next output.

----------------------------------------------------------------------


If the stream we've made doesn't do any computation, what's the point
of that zillion anyways?
  * With lists, it'd be there to stop the recursion and keep the lists
    finite
  * With stream, the *delay* has already stopped the recursion

    - Don't make more stream until you need it!
  * So we don't need another device to stop it.

Just make the stream *infinitely* long!
  - That is, whenever you take the tail of it 
    -- asking for the next value -- there'll be a next value.

(define integers-from
  (method ((n <integer>))
    (cons-stream n (integers-from (+ 1 n)))))

(define (integers <stream>) (integers-from 1))

* integers is an infinite stream:
    -- There's always another integer to pull off of it.
* But it's not an infinite *loop*:
    -- No call to (integers-from 2), because cons-stream delays it.
    -- compare with CONS


integers is bound to something that looks like:

  ( 1 . {promise (integers-from (+ 1 1))} )


Let's print some of this stream out:

(define (print-stream <function>)
  (method ((s <stream>))
    (print (heads s))
    (print-stream (tails s))))

[Note: this is really (maps print s)]

(print-stream integers)

  * Prints 1
  * Forces the tail, the promise, 
    - Evaluates (+ 1 1) to 2
    - evaluates integers-from, giving us 
       (2 . {promise (integers-from (+ 1 2))})
  * Prints 2
  * Forces the tail...
    - (3 . {promise (integers-from (+ 1 3))})
  * Prints 3
  <etc>


----------------------------------------------------------------------


Stream methods MOSTLY make sense on infinite streams.

For example, 

(define (divisible? <function>)
  (method ((x <integer>) (n <integer>))
      (= (remainder x n) 0)))

(define (threes <stream>)
  (filters (method ((x <integer>)) (divisible? x 3)) 
    integers))

Is a stream with the elements:

3 6 9 12 15 18 ...

No matter many you get out of the stream, there are always more.

PHILOSOPHY:
  * Are all the numbers really there?
  * Well, what does "really there" mean?        
    1. If you look at them, you'll find them.
    2. But if you don't look at them, they're not explicitly
       represented in the computer -- not as numbers anyways.

----------------------------------------------------------------------


It's easy to count by 3's.  Let's do something more interesting:
  * Sieve of Eratosthenes (300 BC)
  * Build a stream of primes.

* 2 is prime.
* A number n>2 is prime iff it is not divisible by any prime number
  smaller than itself.

  2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  ^ ^ / ^ | ^ | | |  ^  |  ^  |  |  |  ^  |  ^  |
      2   3   2 3 2     2     2  3  2     2     2

  We can look at this as a recursive process, the head is in the result, and
   so is the tail filtered to remove anything divisible by the head, and then
   recursively.


(define (sieve <function>)
  (method ((stream <stream>))
    (cons-stream
      (heads stream)                                   
      (sieve (filters (method ((x <integer>))
                              (not (divisible? x (heads stream))))
                            (tails stream))))))


new tail is a stream not divisible by the current head
                    

(define (primes <stream>) (sieve (integers-from 2)))

  1. First in the stream is 2.
     And the tail of the stream has all the integers not divisible by 2

[This is an example where you need to really understand the delays
 to make sense of it.]

----------------------------------------------------------------------


Now, something even more peculiar:

  * Defining a stream in terms of itself.

  We *didn't* do this with integers-from -- 
   -- there we defined a procedure returning a stream.

(define (ones <stream>) (cons-stream 1 ones))

ones looks like 1 1 1 1 1 1 ...

ASIDE:
  If we tried this with regular cons, it would not work -- WHY????????
   -- it'd get the old value of ones and stick a 1 on that.
   -- Or an error if there wasn't one.
  But cons-stream delays the second argument 
   -- and it's as easy to delay `ones' as anything else.

Let's define adding streams:

(define (add-streams <function>)
  (method ((a <stream>) (b <stream>))
    (cond ((empty-stream? a) b)
          ((empty-stream? b) a)
          (else: (cons-stream (+ (heads a) (heads b))
                              (add-streams (tails a) (tails b)))))))

(define (integers <stream>)
   (cons-stream 1
     (add-streams ones integers)))

integers ---> (1 . {promise to (add-streams ones integers)})


(tails integers)

  (2 . {promise to (add-streams (tails ones) (tails integers))} )

(tails (tails integers))
;; Add the 1 and 2

  (3 . {promise to (add-streams (tails (tails ones)) (tails (tails integers)))})


This isn't very different from having a *method* defined in terms of
itself --- recursion.

----------------------------------------------------------------------


Now, let's do something a bit more involved using integer streams: Fibonacci numbers

F(n)=F(n-1)+F(n-2)
F(0)=0
F(1)=1

(define (fibs <stream>)
  (cons-stream 0
               (cons-stream 1
                            (add-streams fibs (tails fibs)))))

When we ask for the third element, add-streams adds the first
and second.  The tail is a promise to 
(add-streams (tails fibs) (tails (tails fibs)))

In this case we need two values to ``get the stream started''.

----------------------------------------------------------------------


There are some problems though:

It's very easy to make *divergent* computations:
   * Streams with nothing in them,  
     - which are *not* the same as being empty
     - And, worse, empty-stream? can't detect them!

(define (lose <stream>) 
  (filters odd? (filters even? integers)))

  * Anything that gets past the first filters won't pass the second.
  * Runs forever trying to find a number that is both odd and even.

[Note: (heads lose) isn't what runs forever.  filters always runs until
it finds a value for the head, so the odd filters will run forever.]

----------------------------------------------------------------------


Delayed evaluation can cause all kinds of trouble when mixed with
assignment (set!)
  * Kind of like drinking and driving.

(define (x <symbol>) 'fun)
(define (make-empty <function>)
  (method ()
    (set! x 'yow!)
    empty-stream))

(define (y <stream>) (cons-stream 1 (make-empty)))

x ===> 'fun

y ===> {printed representation of a stream with head 1}

x ===> 'fun

(tail y) ===> empty-stream

x ===> 'yow!


This is really strange.  
  * Displaying some value (y) changed x.


  * Side effects happen at all kinds of random times.
  * When you print one variable to see what its value is, you might
    change some other ones.  
  * You can't trust *anything*

----------------------------------------------------------------------


Great Functional vs. Imperative Programming Debate:

Functional Programming: 
  * All functions are as close as possible to mathematical functions.
  * Dylan without set! (and its relatives)
  + Has substitution model semantics. 

Imperative programming:
  * Standard style
  * Assignments, 
  * OO, etc.

(FP): Side effects cause trouble, in many ways.  They mess up your
   thinking.  Give up the habit.

(IMP): I'll be good and just use side effects for local state, like o-o
   style.  No (or few) global variables. 
   But I really *need* that state.  Can't do good OO programming without
   it.

(FP): Have I got "state" for you!  Look at a bank account as being an
   infinite of stream of transactions and the resulting balances.  
   Each new transaction comes along and the balance changes
   accordingly.  
   
                    +--------+ 
   transactions --> |ACCT    | --> balances
                    +--------+           

What about a shared account?
Well, we could have more than one input stream.  

          +--------+       +------+  
Chris --->|MERGE   | ----> | ACCT | --> balances
  Pat --->|        |       +------+        
          +--------+          

OK, but what's that MERGE critter like?

* Alternating is no good 
  - Then Chris can't deposit money and immediately withdraw it, 
    Pat has to do something in between, and maybe Pat isn't even at
    the bank.

* Fair merge -- ask each one if they're ready to give input.
  - If one is ready and the other isn't, take the one that is.
  - Otherwise, alternate or something.

Note: notion of *time* has re-entered

But this merge box knows about state!  
  * It can tell if one is ready and the other isn't (ready/not-ready
    two states)
  * That's not functional!

The debate isn't over.   
We don't know *what* to do, actually.


----------------------------------------------------------------------


Summary:
  * Delayed evaluation is a powerful tool for allowing certain
    abstractions to be efficient.

  * Delayed evaluation works badly with set!

  * You can often view state as time-evolution of a process
    - and package it in a stream
    instead of as explicit state.