Lecture 24
Streams

The term stream is used in several ways in computer science, here we will look at streams that are ordered collections of elements very similar to lists, except that in a stream the tail or rest of the list is only evaluated when it is needed, rather than evaluated immediately.  Thus allows us to do things such as create infinite data structures.

This statement might sound surprising initially, since any data structure that we want to represent must be finite for obvious reasons. There are applications, however, in which it is more convenient to structure the computation assuming that we have one or more infinite data sources, from which we can extract as much data as we need for the problem at hand. If we are looking, for example, for the smallest prime number with a particular property, we might not know how far in the infinite sequence of ordered prime numbers we need to search to find it. In such a situation, it might make sense to conceive of our computation as using a potentially infinite data source that produces prime numbers on demand. A stream is such a data source.

We stress that "infinite" and "infinity" refer to potential (theoretical) infinity, and not actual infinity. Due to the obvious limitations of various resources (time, memory, and others), no computation will produce an infinite sequence of values. Our streams are infinite in potential, not in fact.

Before we define the stream datatype, let us think for a minute about how we could specify an infinite stream of values. It is immediately clear that we can not actually enumerate these values, so we are left with the alternative of providing a method for computing them. This means that our streams must rely on an infinity of function calls. This is only possible if (at least some of) the functions involved are directly or indirectly recursive. Now, the infinite sequence of function calls can not run to the "end" (there is no end of infinity, of course); at any given time, only a finite number of such calls must have been initiated. Thus me must have a mechanism for (temporarily) stopping further recursive calls. If we understand how we can suspend, and later resume, the sequence of computations that generates the stream values, then we can write streams in SML.

Consider the following definition:

datatype 'a stream = Null | Cons of 'a * (unit -> 'a stream)

First, note that this definition is very similar to our definition of custom lists from early in the semester.

We see that a stream can be either empty (Null), or it can consist of a pair. The first element of the pair is a value (the head of the stream), while the second element of the pair is a 0-argument function that produces a stream. The stream that the 0-argument function returns is the tail (the rest) of our original stream.

That is we use a 0-argument function to delay the evaluation until that function is called.  This is commonly referred to as a "lazy" property, because the evaluation does not happen until it is needed.   Many languages, including some versions of ML, support lazy evaluation directly in the language.  As we saw with the substitution model, in ML the arguments to a function are first evaluated, and then the function is applied to those values.  This is what is referred to as "eager" evaluation, because arguments are evaluated first.  In lazy evaluation, arguments are not evaluated until they are needed. To see the difference consider:

foo((3-3), big-computation(5))

In eager evaluation both 3-3 and the call to big-computation(5) are evaluated before calling foo.  Maybe foo does nothing with its second argument when its first argument is zero, but nonetheless the big computation happens.  In lazy evaluation the arguments are not evaluated until foo uses them, and thus in such a situation the big computation would not happen.  Of course, in lazy evaluation we need to pass around computations that will happen later.  One way to do this is to simply wrap all the computations in 0-argument functions.  In looking at how to implement streams, with delayed tails, we will explicitly use this approach of wrapping in a 0-argument function rather than relying on any lazy evaluation features of the language.

Let us now define the simplest stream - an infinite stream that consists of the same repeated value, ad infinitum:

fun const(c: 'a) = Cons(c, fn() => const c)

But how do we use such a stream? Will the suspended computations ever be trigerred? Here are some functions that work on streams.  Note that they are very similar to functions on lists.

exception Empty

(* Returns the first element of a stream. *)
fun hd(s: 'a stream): 'a =
  case s of 
    Null       => raise Empty
  | Cons(h, _) => h

(* Returns the stream that results after removing the first element. *)
fun tl(s: 'a stream): 'a stream =
  case s of
    Null       => raise Empty
  | Cons(_, t) => t()

(* Applies a function to every element of a stream. *)
fun map (f: 'a -> 'b) (s: 'a stream): 'b stream =
  case s of
    Null  => Null
  | Cons(h, t) => Cons(f h, fn () => map f (t()))


(* Returns the ordered list of the first n elements of the stream.  *)
fun takeN(s: 'a stream, n: int): 'a list =
  case (s, n) of
    (_, 0)          => []
  | (Null, _)       => raise Empty
  | (Cons(h, t), n) => h :: (takeN (t(), n - 1))


(* Produces a stream of values that satisfy a predicate. *)
fun filter (f: 'a -> bool) (s: 'a stream): 'a stream =
  case s of
    Null => Null
  | Cons(h, t) => if f(h) then Cons(h, fn () => filter f (t()))
                          else filter f (t())

Before demonstrating how these functions work, let us define two more streams:

fun nats(n: int) = Cons(n, fn () => nats(n + 1))
fun fibo(a: int, b: int) = Cons(a, fn () => fibo(b, a + b))

Function nats(n) generates a sequence of successive integers starting at the initial value n. When called with the argument 0, we obtain the stream of natural numbers.

Function fibo produces Fibonacci-like sequences; when called with arguments 0 and 1, respectively, it generates the usual Fibonacci sequence 0, 1, 1, 2, 3, ...

The reader might have noticed that all three streams we have defined up to now are infinite, but this does not have to be the case. Indeed, it is easy to define finite, non-empty streams:

- val s1 = Cons(9, fn () => Cons(8, fn () => Cons(7, fn () => Null)))
val s1 = Cons (9,fn) : int stream

And now, let us demonstrate how stream functions work:

- takeN(fibo(0, 1), 10);
val it = [0,1,1,2,3,5,8,13,21,34] : int list
-
- hd s1;
val it = 9 : int
-
- tl s1;
val it = Cons (8,fn) : int stream
-
- hd(tl(tl s1));
val it = 7 : int
-
- takeN(filter (fn n => n mod 2 = 0) (fibo(0, 1)), 4);
val it = [0,2,8,34] : int list
-
- takeN(map (fn n => n * n) s1, 3);
val it = [81,64,49] : int list

Returning for a moment to finite streams, let us create a stream from a list of values given as argument:

fun fromList(l: 'a list): 'a stream =
  case l of
    []   => Null
  | h::t => Cons(h, fn () => fromList t)

- takeN(fromList [9, 7, 5, 3, 1], 4);
val it = [9,7,5,3] : int list

- takeN(fromList [9, 7, 5, 3, 1], 10);

uncaught exception Empty
  raised at: stdIn:824.30-824.35

Note that it is impossible to extract more elements from a finite stream than are available.

Given two streams, what are some meaningful operations that we can define on them? Perhaps surprisingly, one such operation is stream concatenation. Given stream s1 and s2, the concatenation of stream s1 and s2 consists of the ordered sequence of all values in s1, followed by the ordered sequence of all the values in s2. It is obvious that if s1 is infinite, then no value will ever be extracted from s2. On the other hand, streams can be both finite and infinite, so it might be that all values from s1 are consumed, and values from s2 will we used:

fun concatenate(s1: 'a stream, s2: 'a stream): 'a stream =
  case s1 of
    Null       => s2
  | Cons(h, t) => Cons(h, fn () => concatenate(t(), s2))

- takeN(concatenate(fromList [1,2,3,4,5], fromList [6,7,8,9,10]),  7);
val it = [1,2,3,4,5,6,7] : int list

We finish today's discussion of streams by examining an implementation of the Sieve of Eratosthenes, possibly the oldest systematic method (algorithm) for generating the sequence of all prime numbers. The "sieve" can be described as follows:

step 1: Generate the sequence of natural numbers starting at 2.
step 2: Position yourself just before the beginning of the sequence.
step 3: Find the next available number in the sequence. Write it down; it is prime.
step 4: Cross out (delete) all multiples of the number identified in step 3.
step 5: Continue with step 3.

Of course, one can never work with an actual infinite sequence of numbers, nor can one cross out an infinity of multiples.

We will now implement the Sieve:

fun sift (p: int) (s: int stream): int stream =
  filter (fn n => n mod p <> 0) s

fun sieve (s: int stream): int stream =
  case s of
    Null       => Null
  | Cons(s, t) => Cons(s, fn () => sieve(sift s (t())))

val primes = sieve(nats 2);

Did you ever wonder what is the 312th prime? You can now find out:

- List.hd(rev(takeN(primes, 312)));
val it = 2069 : int