Streams and Lazy Evaluation --------------------------- New topic area today. Streams: * Another way of thinking about state For the last month or so we've been working with local state * procedural abstraction and local state variables > a bank account with hidden balance variable * Object-oriented programming > instance variables of objects Today, we'll look at a different view of state: * More like a book or cd-rom - EVOLVES across time This view of programming is rather like signal processing. * Watch the signal/data flow through a processor/program. * It is a STREAM that gets converted to other kinds of STREAMS along the way. +--------+ +--------+ +-----+ ampli- +---------+ laser | CD | digital | D/A | audio | | fied | | light-->| player |-------->| con- |------->| amp |------->| speaker |-->sound | | signal | verter | signal | | audio | | +--------+ +--------+ +-----+ signal +---------+ You can think of each of those boxes as something that takes an entire 72-minute signal, and outputs a different 72-minute signal that's closer to music. The boxes can begin processing the earlier part of the stream before the later parts are even produced. We're going to do something similar, having information flowing through a collection of boxes. We'll look at primitives (kinds of boxes) * enumerate [generate a signal] * map [transform one signal to another] * filter [remove some signal values] * accumulate [turn a signal into a scalar] Idea: * Stereo systems are great 'cause they come in lots of boxes and you can hook them together in many many useful ways. * We'll try to do the same thing with boxes for streams. We've looked at some of these (e.g. map) for lists. Streams will be similar. * Initially, they'll just *be* lists * But next time we'll let them be infinitely long: (1 1 1 1 1 ... - We'll have a stream of all nonnegative integers: (0 1 2 3 4 5 ... - That couldn't possibly fit in memory as a list. We'll have to produce the elements one at a time as we need them. ---------------------------------------------------------------------- First, let's look at a pretty standard piece of Scheme code: * Sum the squares of the prime integers between 1 and n: (define (sum-prime-squares n) (define (f k sum) (cond ((> k n) sum) ((prime? k) (f (+ 1 k) (+ sum (square k)))) (else f (+ 1 k) sum))) (f 1 0)) There are four things going on here: 1. ENUMERATE the numbers 1...n 2. FILTER out the nonprimes from that list 3. MAP square on each of the selected numbers 4. ACCUMULATE the result, using +, starting from 0. Note that filter/map/accumulate are already defined on lists. We'll make them generics to handle streams. +----+ +----+ +---+ +-----+ n --> |ENUM| --> |FILT| --> |MAP| --> |ACCUM| --> sum +----+ +----+ +---+ +-----+ This pattern is pretty hard to see from the code, though: * Everything's going on at once. We're going to use STREAMS to capture this picture. FOR NOW -- and this is *WRONG* but a good start -- think of a stream as a list. Here are the operations on streams (notice similarity with lists): (cons-stream obj stream) (empty? stream) (heads stream) (tails stream) empty Contract: (heads (cons-stream obj stream)) ==> obj (tails (cons-stream obj stream)) ==> stream (empty? empty) ==> #t (empty? (cons-stream obj stream)) ==> #f So far it looks just like lists. But we're not telling you our implementation yet. For now, just pretend they're lists. We can now implement our procedure as a chain of little black boxes. ;;produces stream (1 ... n) (define (interval low high) (if (> low high) empty (cons-stream low (interval (+ 1 low) high))))) (define (sum-prime-squares n) (accumulates + 0 (maps square (filters prime? (interval 1 n)))))) A black box is then going to be a Scheme function * One input, viz. the stream from the left. * One output, the stream going right. - For all but the last box, it's going to be a stream. --------------------------------------------------------------------- Now we need to define each of the boxes. They will be just like their list equivalents, except they work on streams instead of lists. (define (filters test stream) (cond ((empty? stream) empty) ((test (heads stream)) (cons-stream (heads stream) (filters test (tails stream)))) (else (filters test (tails stream))))))) So (filter odd? s) returns a stream of all the odd numbers in stream s, if any. (define (maps f stream) (if (empty? stream) empty (cons-stream (f (heads stream)) (maps f (tails stream)))))) (map square s) returns a stream of all the elements of s squared (define (accumulates combiner init stream) (if (empty? stream) init (combiner (heads stream) (accumulates combiner init (tails stream))))))) Now sum-prime-squares works as diagrammed in the streams diagram. ---------------------------------------------------------------------- So just looks like we are rewriting a bunch of code that we already knew how to write another way. Have this different view of ``boxes'' processing the data. Why are we bothering with this? 1. It's a very powerful metaphor. * most scientific FORTRAN code fits into this view. * the famous and popular Unix system gets major mileage out of such a view (pipes) 2. Once you've got a decent library of stream functions, you can throw together fancy programs quite fast. * That's Unix's major win. 3. It will enable us to talk about computations over INFINITE data structures: * "the integers", say. (advert. for next lecture) Lazy Evaluation --------------- Implementing streams as lists can be massively inefficient: *the whole list is produced before any computation is done. Suppose I ask, "What is the second prime between 10,000 and 93,000,000?" (head (tail (filter prime? (interval 10000 93000000)))) We end up having to 1. create a list of 92,990,000 integers, 2. Check them ALL for primality, 3. Pick the second one! That's a pretty impressive waste of work. How do we do better? We use a common and very powerful idea: BE LAZY! -- but be lazy in a particular way Specifically, At selected points in the code, we deliver a promise to do something rather than actually doing it. Maybe nobody will actually collect on it! Then we don't have to do the work! The difference between streams and lists is just this: * With a stream, the tail is *not* evaluated when you *MAKE* the stream * It is only evaluated when you *USE* it. tail evaluates the tail. pair-stream doesn't Contrast this with a list. Same contract, but EVALUATION HAPPENS AT A DIFFERENT TIME. (heads (cons-stream obj stream)) ==> obj (stream not evaluated yet) (tails (cons-stream obj stream)) ==> stream (now it is evaluated) The tail is a PROMISE to evaluate the tail when asked to, not the actual value. We need to do this for infinite streams, since the whole thing cannot exist at one time, we have to construct it as we go along. With (finite) lists, the whole list exists all at once. With (finite or infinite) streams, the head exists, but the tail is not created until needed. This DELAYED EVALUATION gives a demand-driven computation -- Do things when you need to +-----+ +-----+ +-----+ --> | | --> | | --> | | --> +-----+ +-----+ +-----+ It's more like pulling a string through a bunch of holes than pouring water through them - you can get as much as you want - but you don't get any more. ---------------------------------------------------------------------- We implement this by inventing a new special form DELAY: -- It doesn't follow the usual evaluation rules--it is a SPECIAL FORM -- It's not in Scheme, but we can define it easily with macros (delay x) --> delivers a promise to evaluate x when it has to. (force y) --> collects on the promise (define y (delay (/ 1 0))) * Not an error, (force y) * gives error: division by zero. Note that this is a lot like lambdas with no arguments: (define yy (lambda () (/ 1 0))) ; no error (yy) ==> error: div by 0. In fact, this is how we could implement it: (defmacro (delay exp) (list 'lambda () exp)) so (delay expr) is a macro expanding to (lambda () expr) (define (force promise) (promise)) so expr can be evaluated LATER by calling this function with no args. That's exactly what force does. This isn't quite the way that we implement delay and force -- we'll see why in a minute, but basically, even though this is lazy, it's not really lazy enough. ---------------------------------------------------------------- Now that we have delay and force, back to streams. Here is our complete definition: (defclass ) (defclass ( )) (defclass ( ) (heads :type :initarg :heads :accessor get-heads) (tails :type :initarg :tails :accessor get-tails)) Note: the tail of a is a delayed stream, i.e. a that when forced (called with no arguments) gives a stream. (define (emptys? s) (instance-of? s )) (define emptys (make )) (defmacro (cons-stream obj str) (list 'make ' ':heads obj ':tails (list 'delay str))) (define heads get-heads) (define (tails s) (force (get-tails s))) cons-stream must be a macro, because we don't want to evaluate the second argument. The macro call (cons-stream obj stream) expands to (make-pair-stream obj (delay stream)) ---------------------------------------------------------------- Abusing the notation for pairs, let's abbreviate a with head-stream h and tail-stream f by (h . f) Then (cons-stream (+ 1 1) (+ 2 2)) ==> (make :heads (+ 1 1) :tails (delay (+ 2 2))) ==> (make :heads 2 :tails (lambda () (+ 2 2))) ==> (2 . (lambda () (+ 2 2))) The head is evaluated, but the tail is a promise to evaluate. (heads (tails (filters prime? (intervals 10000 93000000)))) The expression (intervals 10000 93000000) evaluates to a ( 10000 . (lambda () (interval 10001 93000000))) Well, 10000 isn't prime, so filter will ask for the next one in the stream. and (tails (10000 . (lambda () (interval 10001 93000000)) ) forces the tail, which does the next interval computation ---------------------------------------------------------------------- Note: streams have an asymmetry, namely: * the head is always forced * the tail isn't until needed So for example (filters (method (x) (> x 1000000)) (interval 1 10000000000)) runs for a long time. ---------------------------------------------------------------------- There's one last inefficiency we should mention: * If we're implementing delay as a function of no arguments (as above), what if you need the same element multiple times? Each time it is computed anew. --> Expensive! We MEMOIZE it the first time we compute it --- save result, use it later ---------------------------------------------------------------------- STREAMS: * Like lists, but they *delay* their tails - Only evaluate them when necessary * Delay - promise to compute something later - When (force x)'ed. * Stream operations: maps filters etc. NEXT TIME: infinite streams. E.g. ones = (1 1 1 1 1 ... defined by: (define (ones ) (pair-stream 1 ones)) naturals = (0 1 2 3 4 ... primes = (2 3 5 7 11 13 ...