Iteration abstraction

Topics:

different interfaces for iterations
iterator pattern
implementing Iterator
coroutine iterators

We have been looking at collection abstractions but focusing on operations that return a single value. What if we want to do something with all or many of the elements in a collection? More generally, what if we want to have an operation whose result is a arbitrarily large number of values?

Printing out all the elements in a set
Finding all the elements with keys in some range (e.g., dates)

An iteration abstraction is an operation that gives the client an arbitrarily long sequence of values. Ideally, it compute the elements in that sequence lazily, so it doesn't do work computing elements that might never be used if the client doesn't look that far into the sequence.

Strategy 1: return an array

A straightforward approach is to just return a big array containing all the values the client might want. This is not a terrible approach, but it isn't lazy: the entire array needs to be computed. If the number of elements in the sequence is very large, the array will just be too big. Also, returning an array invites implementations that expose the rep when the underlying implementation stores the elements in an array. What does it mean to update that array?

To avoid committing the implementer to using arrays, we can alternatively add observer methods for doing iteration, but have an array-like interface:

class Collection<T> {
  ...
  int numElements();
  T getElement(int n);
  ...
}

Given a collection c, we can write a loop using this interface:

  for (int i = 0; i < c.numElements(); i++) {
    T x = c.getElement(i);
  }

The downside of this is that it's often hard to implement random access to the nth element without simply computing the whole array of results. So we end up precomputing elements in a non-lazy (i.e., eager) way anyway.

Strategy 2: Iterator state in the collection (bad)

Another (even worse) idea is to have the collection know about the state of the iteration, with an interface like this:

class Collection<T> {
  ...
  /** Set the state of the iteration back to the first element. */
  void resetIteration();
  /** Return the next element in the sequence of iterated values. */
  T next();
  ...
}

This approach is tempting, but has serious problems. It makes it hard to share the object across different pieces of code, because there can be only one client trying to iterate over the object at a time. Otherwise, they will interfere with each other. Don't take this approach.

Strategy 3: Iterator pattern

The standard solution to the problem of supporting iteration abstraction is what is known as the iterator pattern. It is also known as cursor objects, generators, and some other names -- it's one of those wheels that has been reinvented a few times.

The idea is that since we can't have the collection keep track of the iteration state directly, we instead put that iteration state into a separate iterator object. Then, different clients can have their own iteration proceeding on the same object, because they will each have their own separate iterator object. Here's what the interface to that object looks like:

interface Iterator<T> {
    /** Whether there is a next element in the sequence. */
    boolean hasNext();
    /** Advance to the next element in the sequence and return it.
    Throw the exception NoSuchElementException if there is no next element. */
    T next();
    /** Remove the current element. */
    void remove();
}

Then we can provide iteration abstractions by defining operations that return an object that implements the Iterator interface. For example, the Collection class has an operation iterator for iterating through all the element of a collection:

class Collection<T> {
    Iterator<T> iterator();
}

To use this interface, typically we use a for loop, as in the following example:

Collection<String> c;
for (Iterator<String> i = c.iterator(); i.hasNext(); ) {
    String x = i.next();
    // use x here
}

Java even has syntactic sugar for writing a loop like this:

Collection<String> c;
for (String x : c) {
    // use x here
}

Under the covers, exactly the same thing is happening when this loop runs, even though you don't have to declare an iterator object i.

Notice there's another operation in the interface, remove, which changes the underlying collection to remove the last element returned by next. Not every iterator supports this operation, because it doesn't make sense for every iteration abstraction to remove elements.

Because all Java collections provide an iterator method, we can iterate over the elements of a collection without needing to know what kind of collection it is, or how iteration is implemented for that collection, or even that the iterator comes from a collection object in the first place. That is the advantage of having an iteration abstraction.

Specifications for iterators

Specifications for iteration abstractions are similar to ordinary function specifications. There are couple of issues specific to iteration. One is the order in which elements are produced by the iterator. It's useful to specify whether there is or is not an ordering.

A second issue is what operations can be performed during iteration. Usually these consist of the observers. However, sometimes observers have hidden side effects that conflict with iterators, and the client needs to know about this.

Implementing Iterator

Java Iterators are a nice interface for providing iteration abstraction, but they do have a downside: they are not that easy to implement correctly. There are several problems for the implementer to confront:

The iterator must remember where it is in the iteration state and be able to restart at any point. Part of the solution is that the iterator has to have some state.
The methods next and hasNext do similar work and it can be tricky to avoid duplicating work across the two methods. The solution is to have hasNext save the work it does so that next can take advantage of it.
The underlying collection may be changed either by invoking its methods or by invoking the remove method on this or another iterator. Mutations except through the current iterator invalidate the current iterator. The built-in iterators throw a ConcurrentModificationException in this case.

Example: A list iterator. Suppose we have a linked list with a header object of class LinkedList (In this example, it's a parameterized class with a type parameter T).

class LinkedList<T> {
    ListNode<T> first;
    Iterator<T> iterator() {
	return new LLIter<T>(first);
    }
}

class ListNode<T> {
    T elem;
    ListNode<T> next;
}

The Iterator interface is implemented here by a separate class LLIter:

class LLIter<T> implements Iterator<T> {
    ListNode<T> curr;
    LLIter(ListNode<T> first) {
	curr = first;
    }
    boolean hasNext() {
	return (curr != null);
    }
    T next() {
	if (curr == null) throw new NoSuchElementException();
	T ret = curr.next;
	curr = curr.next;
	return ret;
    }
    void remove() {
	if (curr != null) {
	    // oops! can't implement without a pointer to
	    // previous node.
	}
    }
}

Notice that we can only have one iterator object if any iterator object is mutating the collection. But if no iterators are mutating the collection, and there are no other mutations to the collection from other clients, then we could have any number of iterators. It is actually possible to implement iterators that work in the presence of mutations, but this requires that the collection object keep track of all the iterators that are attached to it, and update their state appropriately whenever a mutation occurs. Usually people don't bother to do this.

This example showed how to implement iterators for a collection class, but we can implement iteration abstractions for other problems. For example, suppose we wanted to do some computation on all the prime numbers. We could define an iterator object that iterates over all primes!

class Primes implements Iterator<Integer> {
    int curr = 1;
    boolean hasNext() {
	return true;
    }
    Integer next() {
	while (true) {
	    curr++;
	    for (int i = 2; curr % i != 0; i++) {
		if (i*i >= curr) return curr;
	    }
	}
    }
}

for (Iterator<Integer> i = new Primes(); i.hasNext(); ) {
    int p = i.next();
    // use p
}

Coroutine iterators

Some languages, such as C# 2.0 and the scripting language Ruby, support an easier way to implement iterators as coroutines. You can think of these as method that run on the side of the client and send back values whenever they want without returning. Instead of return, a coroutine uses a yield statement to send values to the client. This makes writing iterator code simple. For example, in the JMatch language developed here at Cornell, you can implement a tree iterator as follows:

class TreeNode {
  Node left, right;
  int val;

  int elements() iterates(result) {
    foreach (left != null && int elt = left.elements())
	yield elt;
    yield val;
    foreach (right != null && int elt = right.elements())
	yield elt;
  }
}

This is much shorter than the code needed to implement the same iterator in Java.

Strategy 4: Using function objects

A final way to support iteration abstraction is to wrap up the body of the loop that you want to do on each iteration as a function object. Here's how that approach would look as an interface:

class Collection<T> {
    void iterate(Function<T> body);
}

interface Function<T> {
    /** Perform some operation on elem. Return true if the loop should
     * continue. */
    boolean call(T elem);
}

The idea is that iterate calls body.call(v) for every value v to be sent to the client, at exactly the same places that a coroutine iterator would use yield. The client provides an implementation of Function<T>.call that does whatever it wants to on the element v. So instead of writing:

for (int x : c) {
    print(x);
}

We would write something like this:

c.iterate(new MyLoopBody());

...

class MyLoopBody implements Function<Integer> {
    boolean call(Integer x) {
	print("x");
	return true;
    }
}

This is pretty easy to implement, but it's not quite as convenient for the client. The whole loop needs to run all at once -- the client can't pick off a few values, then save the iterator away for later. On the other hand, that isn't usually how iteration abstractions end up being used.

Summary

We have seen a couple of good options for declaring and implementing iteration abstractions. A mark of good ADT design is interfaces that contain clear iteration abstractions. When you're writing code, think about whether an iteration abstraction would meet the needs of your clients, and watch out for the pitfalls discussed here!