We have been looking at collection abstractions but focusing on operations that return a single value. What if we want to do something with all or many of the elements in a collection? More generally, what if we want to have an operation whose result is a arbitrarily large number of values?
An iteration abstraction is an operation that gives the client an arbitrarily long sequence of values. Ideally, it compute the elements in that sequence lazily, so it doesn't do work computing elements that might never be used if the client doesn't look that far into the sequence.
A straightforward approach is to just return a big array containing all the values the client might want. This is not a terrible approach, but it isn't lazy: the entire array needs to be computed. If the number of elements in the sequence is very large, the array will just be too big. Also, returning an array invites implementations that expose the rep when the underlying implementation stores the elements in an array. What does it mean to update that array?
To avoid committing the implementer to using arrays, we can alternatively add observer methods for doing iteration, but have an array-like interface:
class Collection<T> { ... int numElements(); T getElement(int n); ... }Given a collection
c
, we can write a loop using this interface:
for (int i = 0; i < c.numElements(); i++) { T x = c.getElement(i); }
The downside of this is that it's often hard to implement random access to the nth element without simply computing the whole array of results. So we end up precomputing elements in a non-lazy (i.e., eager) way anyway.
Another (even worse) idea is to have the collection know about the state of the iteration, with an interface like this:
class Collection<T> { ... /** Set the state of the iteration back to the first element. */ void resetIteration(); /** Return the next element in the sequence of iterated values. */ T next(); ... }
This approach is tempting, but has serious problems. It makes it hard to share the object across different pieces of code, because there can be only one client trying to iterate over the object at a time. Otherwise, they will interfere with each other. Don't take this approach.
The standard solution to the problem of supporting iteration abstraction is what is known as the iterator pattern. It is also known as cursor objects, generators, and some other names -- it's one of those wheels that has been reinvented a few times.
The idea is that since we can't have the collection keep track of the iteration state directly, we instead put that iteration state into a separate iterator object. Then, different clients can have their own iteration proceeding on the same object, because they will each have their own separate iterator object. Here's what the interface to that object looks like:
interface Iterator<T> { /** Whether there is a next element in the sequence. */ boolean hasNext(); /** Advance to the next element in the sequence and return it. Throw the exception NoSuchElementException if there is no next element. */ T next(); /** Remove the current element. */ void remove(); }
Then we can provide iteration abstractions by defining operations
that return an object that implements the Iterator interface. For
example, the Collection class has an operation iterator
for
iterating through all the element of a collection:
class Collection<T> { Iterator<T> iterator(); }
To use this interface, typically we use a for
loop, as
in the following example:
Collection<String> c;
for (Iterator<String> i = c.iterator(); i.hasNext(); ) {
String x = i.next();
// use x here
}
Java even has syntactic sugar for writing a loop like this:
Collection<String> c;
for (String x : c) {
// use x here
}
.
Under the covers, exactly the same thing is happening when this loop
runs, even though you don't have to declare an iterator object
i
.
Notice there's another operation in the interface,
remove
, which changes the underlying collection to remove
the last element returned by next
. Not every iterator
supports this operation, because it doesn't make sense for every
iteration abstraction to remove elements.
Because all Java collections provide an iterator
method,
we can iterate over the elements of a collection without needing to know
what kind of collection it is, or how iteration is implemented for that
collection, or even that the iterator comes from a collection object in
the first place. That is the advantage of having an iteration abstraction.
Specifications for iteration abstractions are similar to ordinary function specifications. There are couple of issues specific to iteration. One is the order in which elements are produced by the iterator. It's useful to specify whether there is or is not an ordering.
A second issue is what operations can be performed during iteration. Usually these consist of the observers. However, sometimes observers have hidden side effects that conflict with iterators, and the client needs to know about this.
Java Iterators are a nice interface for providing iteration abstraction, but they do have a downside: they are not that easy to implement correctly. There are several problems for the implementer to confront:
next
and hasNext
do similar
work and it can be tricky to avoid duplicating work across the two
methods. The solution is to have hasNext
save the work it
does so that next
can take advantage of it.
remove
method on this or another
iterator. Mutations except through the current iterator invalidate the
current iterator. The built-in iterators throw a
ConcurrentModificationException
in this case.
Example: A list iterator. Suppose we have a linked list with a
header object of class LinkedList
(In this example, it's
a parameterized class with a type parameter T
).
class LinkedList<T> { ListNode<T> first; Iterator<T> iterator() { return new LLIter<T>(first); } } class ListNode<T> { T elem; ListNode<T> next; }
The Iterator interface is implemented here by a separate class
LLIter
:
class LLIter<T> implements Iterator<T> {
ListNode<T> curr;
LLIter(ListNode<T> first) {
curr = first;
}
boolean hasNext() {
return (curr != null);
}
T next() {
if (curr == null) throw new NoSuchElementException();
T ret = curr.next;
curr = curr.next;
return ret;
}
void remove() {
if (curr != null) {
// oops! can't implement without a pointer to
// previous node.
}
}
}
Notice that we can only have one iterator object if any iterator object is mutating the collection. But if no iterators are mutating the collection, and there are no other mutations to the collection from other clients, then we could have any number of iterators. It is actually possible to implement iterators that work in the presence of mutations, but this requires that the collection object keep track of all the iterators that are attached to it, and update their state appropriately whenever a mutation occurs. Usually people don't bother to do this.
This example showed how to implement iterators for a collection class, but we can implement iteration abstractions for other problems. For example, suppose we wanted to do some computation on all the prime numbers. We could define an iterator object that iterates over all primes!
class Primes implements Iterator<Integer> { int curr = 1; boolean hasNext() { return true; } Integer next() { while (true) { curr++; for (int i = 2; curr % i != 0; i++) { if (i*i >= curr) return curr; } } } } for (Iterator<Integer> i = new Primes(); i.hasNext(); ) { int p = i.next(); // use p }
return
, a coroutine uses a yield
statement to
send values to the client. This makes writing iterator code simple.
For example, in the JMatch language
developed here at Cornell, you can implement a tree iterator as follows:
class TreeNode { Node left, right; int val; int elements() iterates(result) { foreach (left != null && int elt = left.elements()) yield elt; yield val; foreach (right != null && int elt = right.elements()) yield elt; } }
This is much shorter than the code needed to implement the same iterator in Java.
class Collection<T> {
void iterate(Function<T> body);
}
interface Function<T> {
/** Perform some operation on elem. Return true if the loop should
* continue. */
boolean call(T elem);
}
The idea is that iterate
calls body.call(v)
for every value v
to be sent to the client, at exactly the same places that a coroutine iterator would use yield.
The client provides an implementation of
Function<T>.call
that does whatever it wants to on
the element v
. So instead of writing:
for (int x : c) { print(x); }We would write something like this:
c.iterate(new MyLoopBody()); ... class MyLoopBody implements Function<Integer> { boolean call(Integer x) { print("x"); return true; } }
This is pretty easy to implement, but it's not quite as convenient for the client. The whole loop needs to run all at once -- the client can't pick off a few values, then save the iterator away for later. On the other hand, that isn't usually how iteration abstractions end up being used.
We have seen a couple of good options for declaring and implementing iteration abstractions. A mark of good ADT design is interfaces that contain clear iteration abstractions. When you're writing code, think about whether an iteration abstraction would meet the needs of your clients, and watch out for the pitfalls discussed here!