CS 312 Lecture 16
Mutable data abstractions

Evaluation with a store

Once we add refs and arrays to SML, we can no longer reason about evaluation of programs as simply. Previously we could think about evaluation as involving a series of changes to a system configuration that consisted of just a single program term (i.e., expression). Each evaluation step took one subterm of the term and reduced it to a value. With imperative update, the system configuration comprises not only a term but also a store that records for each store location what value is in it.

In response to imperative updates, such as uses of the assignment operator :=, the store part of the configuration changes. So an evaluation step can affect both parts of the configuration. Reasoning about the changes to the store is more difficult than reasoning about the changes to the program term because where in the store the update happens is determined by locations that happen to have been computed in the term. In practice this leads to a lot of bugs, so it is a good idea to use imperative update in a limited, careful way.

Mutable data abstraction

Mutable data abstractions are abstractions whose value can change over time. We have avoided using them until now because they are harder to reason about than immutable (functional) data abstractions. But for solving some problems they offer an advantage in efficiency.

Arrays

An important kind of mutable data structure that SML provides is the array.  The type t array is in fact very similar to the Java array type t[].  Arrays generalize refs in that they are a sequence of mutable cells containing values.  We can think of a ref cell as an array of size 1.  Here's a partial signature for the builtin Array structure for SML, including specifications:

  signature ARRAY =
    sig
    (* Overview: an 'a array is a mutable fixed-length sequence of
       * elements of type 'a. *)
      type 'a array

      (* array(n,x) is a new array of length n whose elements are
       * all equal to x. *)
      val array : int * 'a -> 'a array
           (* fromList(lst) is a new array containing the values in lst *)
      val fromList : 'a list -> 'a array
           exception Subscript (* indicates an out-of-bounds array index *)
           (* sub(a,i) is the ith element in a. If i is
       * out of bounds, raise Subscript *)
      val sub : 'a array * int -> 'a
      (* update(a,i,x)
       * Effects: Set the ith element of a to x
       * Raise Subscript if i is not a legal index into a *)
      val update : 'a array * int * 'a -> unit
           (* length(a) is the length of a *)
      val length : 'a array -> int

      ...
    end

See the SML documentation for more information on the operations available on arrays.

Notice that we have started using a new kind of clause in the specification, the effects clause. This clause specifies side effects that the operation has beyond the value it returns. When a routine has a side effect, there should be an "Effects:" clause to explicitly warn the user that a side effect may occur. For example, the update function returns no interesting value, but it does have a side effect.

An imperative update to a mutable data abstraction is also known as a destructive update, because it "destroys" the old value of the data structure. An assignment to an array element changes the array in place, destroying the old sequence of elements that formerly made up the array. When destructive operation is performed on a mutable data abstraction, it looks to the client like an imperative assignment is performed, changing the abstraction to refer to a new value.

Programming in an imperative style is trickier than in a functional style exactly because the programmer has to be sure that the old value of the mutable data is no longer needed at the time that a destructive update is performed. In general it's hard to know whether there might be reference to the data where a side effect wasn't expected.

Mutable sets and specifying side effects

Mutable collections such as sets and maps are another important kind of mutable data abstraction. We've seen several different implementations of sets thus far, but they have implemented an immutable set abstraction. A mutable set is a set that can be imperatively updated to include more elements, or to remove some elements. 

Here is an example of a signature for a mutable set. These signatures show an important issue in writing effects clauses. To specify a side effect, sometimes we need to be able to talk about the state of a mutable value both before and after the routine is executed. Writing "_pre" or "_post" after the name of a variable is a compact way of talking about that the state of the value in that variable before and after the function executes, respectively.

signature MSET = sig
  (* Overview: a set is a mutable set of items of type elem.
   * For example, if elem is int, then a set might be
   * {1,-11,0}, {}, or {1001} *)
  type elem
  type set
  (* empty() creates a new empty set *)
  val empty : unit -> set
  (* Effects: add(s,x) adds the element x to s if it is
   * not there already: spost = spre U {x})
  val add: set * elem -> unit
  (* remove(s,x) removes the element x from s it it is
   * there already *)
  val remove: set * elem -> unit
  (* member(s,x) is whether x is a member of s *)
  val member: set * elem -> bool
  (* size(s) is the number of elements in s *)
  val size: set -> int
  (* fold over the elements of the set *)
  val fold: ((elem*'b)->'b) -> 'b -> set -> 'b
  val fromList: elem list -> set
  val toList: set -> elem list
end

Classifying operations

When designing the interface to a mutable data abstraction, it is a good idea to select operations that fall into one of three broad categories:

This rule of thumb reduces the amount of reasoning that programmers need to do about side effects, because creators and observers usually do not have side effects; only mutators do.

The MSET signature contains examples of all three kinds of operations: empty and fromList are creators; member, size, fold, and toList are observers; and add and remove are mutators. Similarly, in ARRAY we have creators array and fromList, observers sub and length, and a mutator update.

Equality vs. similarity

Two things are equal if they can be substituted for each other in any context without any observable difference. For example, if I create two SML string values "hi" and "hi", there is no way to tell which one is being used, though in fact the SML implementation uses different pieces of memory to keep track of the characters in the two strings.

Mutable data abstractions break simple reasoning about equality. For example, two distinct expressions array(3, 0) cannot be substituted for each other; they will be distinguishable in some contexts. For example:

  let 
    var a = array(3,0)
    var b = array(3,0)
  in
    update a 0 1;
    sub a 0 = sub b 0
  end

Here, the values referred to by a and b are not truly equal, because some expressions will have a different value when a is substituted for b. Therefore, we say that two mutable data objects are similar when their current state is abstractly equal. Data objects that are similar now may not be in the future.

Two mutable values are truly equal only if every change to one of the values affects the other one. In general this means that the refs inside them share the same location.

Unfortunately, some languages, including Java, are a little confused about what equality means, which leads programmers to make mistakes. In some contexts in Java, the equals methods means equality, and in others, similarity. So you have to be careful about this method. For example, think about what happens if you have a hash table storing mutable keys. If a key is mutated, it could break the data structure invariant of the hash table. It will not be possible to find that key using the Java equals method in its typical implementation as similarity.

Rep invariants

Mutable data abstractions need rep invariants, just like immutable abstractions do. However, mutation raises some new issues for maintaining rep invariants. For functional programming, we have to make sure that any new abstract values constructed satisfy the rep invariant. For imperative programming we also need to make sure that the rep invariant is not broken for existing abstract values.

For example, consider an following implementation of the MSET signature, in which an underlying sorted array is used as the representation:

functor ArrayMSet(structure Key: ORD_KEY)
        :> MSET where type elem = Key.ord_key
  = struct
    open Array
    type elem = Key.ord_key
    type set = {elems: elem option array ref, size: int}
    (* ref {elems, size} represents a set containing the first size
     * elements in elems.
     * Rep invariant: the first size elements have the form SOME(e)
     * and they are in sorted order according to Key.compare.
     *)
    val initial_size = 10
    val empty () = {elems = array(10, NONE), size = 0}
    ...
    
  end

The idea is to create an array that is large enough to hold all the elements of the set. If too many add's are done, a new array is created. Only the first size elements of the array are actually used to store elements, and they are stored in sorted order. The member operation can be performed  using binary search with O(lg n) time. However, add will not be as efficient, because insertion into the middle of the array will take O(n) time. Note that adding the element at the end of the array would take O(1) time but would break the rep invariant on the set that is being extended.

Exposing the rep

A common mistake when designing a mutable abstraction is exposing the rep -- that is, implementing operations in a way that allows users access to mutable pieces of the representation. The problem is that these mutable values may then be updated by a  thoughtless or malicious client, causing invalid changes to the original abstract value.

For example, suppose that we add an operation toArray: set -> elem option array to the mutable set interface. It looks very easy to implement in the functor above:

fun toArray({elems,size}) = !elems

This implementation simply returns the array that is used as part of the representation of the set. The problem is that a client receiving the array of elements can change the array using update, and in doing so break the rep invariant of the set the array was received from.

In early versions of Java, there was actually a security hole based on improperly exposing the rep. An array was returned containing security-critical information in response to a query; this array could be modified by applet code, changing the system security policy!

Aliasing

When a routine has side effects or manipulates mutable data structures, another important issue to consider in the specification is aliasing. Aliasing occurs when two different variables refer to the same underlying mutable data, so changes through one variable affect the other one. For example, consider a function that copies elements from one set to another:

(* copy(s1,s2): add all the elements of s2 into the set s1
 * Requires: s1 and s2 are different sets. *)
val copy: set * set -> unit

This function might easily be implemented in a way that causes it not to work when the inputs are aliased and the function is trying to copy elements from the set to itself. There, in the specification for this function, we should specify whether the two sets are allowed to be the same set.

Aliasing is the source of many bugs, because often programmers do not think enough about the possibility of aliasing.

Benign side effects

Not every side effect needs to be documented with an effects clause. If a side effect does not affect the abstract value that a mutable representation maps to, the side effect will be invisible to the user of the abstraction. Therefore, it need not be mentioned in the specification. Side effects of this sort as known as benign side effects, because they are not destructive. Benign side effects can be useful for building caches internal to data structures, or for data structure reorganizations that improve performance without affecting their abstract value.