Once we add refs and arrays to SML, we can no longer reason about evaluation of programs as simply. Previously we could think about evaluation as involving a series of changes to a system configuration that consisted of just a single program term (i.e., expression). Each evaluation step took one subterm of the term and reduced it to a value. With imperative update, the system configuration comprises not only a term but also a store that records for each store location what value is in it.
In response to imperative updates, such as uses of the assignment operator
:=
, the store part of the configuration changes. So an evaluation
step can affect both parts of the configuration. Reasoning about the changes to
the store is more difficult than reasoning about the changes to the program
term because where in the store the update happens is determined by locations
that happen to have been computed in the term. In practice this leads to a
lot of bugs, so it is a good idea to use imperative update in a limited,
careful way.
Mutable data abstractions are abstractions whose value can change over time. We have avoided using them until now because they are harder to reason about than immutable (functional) data abstractions. But for solving some problems they offer an advantage in efficiency.
An important kind of mutable data structure that SML provides is
the array. The type t array
is in
fact very similar to the Java array type t[]
. Arrays generalize refs in that they are a sequence of
mutable cells containing values. We can think of
a ref cell as an array of size 1. Here's a partial
signature for the builtin Array structure for SML, including
specifications:
signature ARRAY = sig (* Overview: an 'a array is a mutable fixed-length sequence of * elements of type 'a. *) type 'a array (* array(n,x) is a new array of length n whose elements are * all equal to x. *) val array : int * 'a -> 'a array (* fromList(lst) is a new array containing the values in lst *) val fromList : 'a list -> 'a array exception Subscript (* indicates an out-of-bounds array index *) (* sub(a,i) is the ith element in a. If i is * out of bounds, raise Subscript *) val sub : 'a array * int -> 'a (* update(a,i,x) * Effects: Set the ith element of a to x * Raise Subscript if i is not a legal index into a *) val update : 'a array * int * 'a -> unit (* length(a) is the length of a *) val length : 'a array -> int ... end
See the SML documentation for more information on the operations available on arrays.
Notice that we have started using a new kind of clause in the specification,
the effects clause. This clause specifies side effects that the operation
has beyond the value it returns. When a routine has a side effect, there should
be an "Effects:" clause to explicitly warn
the user that a side effect may occur. For example, the update
function returns no interesting value, but it does have a side effect.
An imperative update to a mutable data abstraction is also known as a destructive update, because it "destroys" the old value of the data structure. An assignment to an array element changes the array in place, destroying the old sequence of elements that formerly made up the array. When destructive operation is performed on a mutable data abstraction, it looks to the client like an imperative assignment is performed, changing the abstraction to refer to a new value.
Programming in an imperative style is trickier than in a functional style exactly because the programmer has to be sure that the old value of the mutable data is no longer needed at the time that a destructive update is performed. In general it's hard to know whether there might be reference to the data where a side effect wasn't expected.
Mutable collections such as sets and maps are another important kind of mutable data abstraction. We've seen several different implementations of sets thus far, but they have implemented an immutable set abstraction. A mutable set is a set that can be imperatively updated to include more elements, or to remove some elements.
Here is an example of a signature for a mutable set.
These signatures show an important issue in writing effects clauses. To specify
a side effect, sometimes we need to be able to talk about the state of a mutable
value both before and after the routine is executed. Writing "_pre
"
or "_post
" after the name of a variable is a compact way
of talking about that the state of the value in that variable before and after
the function executes, respectively.
signature MSET = sig (* Overview: a set is a mutable set of items of type elem. * For example, if elem is int, then a set might be * {1,-11,0}, {}, or {1001} *) type elem type set (* empty() creates a new empty set *) val empty : unit -> set (* Effects: add(s,x) adds the element x to s if it is * not there already: spost = spre U {x}) val add: set * elem -> unit (* remove(s,x) removes the element x from s it it is * there already *) val remove: set * elem -> unit (* member(s,x) is whether x is a member of s *) val member: set * elem -> bool (* size(s) is the number of elements in s *) val size: set -> int (* fold over the elements of the set *) val fold: ((elem*'b)->'b) -> 'b -> set -> 'b val fromList: elem list -> set val toList: set -> elem list end
When designing the interface to a mutable data abstraction, it is a good idea to select operations that fall into one of three broad categories:
Array.fromList
will produce two distinguishable arrays even if the same list is provided as
input. The two arrays can be distinguished because updates to one array will
be not be visible in the other.This rule of thumb reduces the amount of reasoning that programmers need to do about side effects, because creators and observers usually do not have side effects; only mutators do.
The MSET
signature contains examples of all three kinds of
operations: empty
and fromList
are creators; member, size
,
fold
, and toList
are observers; and add
and remove
are mutators. Similarly, in ARRAY we have creators array
and fromList
, observers sub
and length
,
and a mutator update
.
Two things are equal if they can be substituted for each other in any context
without any observable difference. For example, if I create two SML string
values "hi"
and "hi"
, there is no way to tell which
one is being used, though in fact the SML implementation uses different
pieces of memory to keep track of the characters in the two strings.
Mutable data abstractions break simple reasoning about equality. For example,
two distinct expressions array(3, 0)
cannot be substituted
for each other; they will be distinguishable in some contexts. For example:
let var a = array(3,0) var b = array(3,0) in update a 0 1; sub a 0 = sub b 0 end
Here, the values referred to by a
and b
are not
truly equal, because some expressions will have a different value when
a
is substituted for b
. Therefore, we say that two
mutable data objects are similar when their current state is
abstractly equal. Data objects that are similar now may not be in the future.
Two mutable values are truly equal only if every change to one of the values affects the other one. In general this means that the refs inside them share the same location.
Unfortunately, some languages, including Java, are a little confused about
what equality means, which leads programmers to make mistakes. In some
contexts in Java, the equals
methods means equality, and in
others, similarity. So you have to be careful about this method. For example,
think about what happens if you have a hash table storing mutable keys. If a
key is mutated, it could break the data structure invariant of the hash table.
It will not be possible to find that key using the Java equals
method in its typical implementation as similarity.
Mutable data abstractions need rep invariants, just like immutable abstractions do. However, mutation raises some new issues for maintaining rep invariants. For functional programming, we have to make sure that any new abstract values constructed satisfy the rep invariant. For imperative programming we also need to make sure that the rep invariant is not broken for existing abstract values.
For example, consider an following implementation of the MSET signature, in which an underlying sorted array is used as the representation:
functor ArrayMSet(structure Key: ORD_KEY) :> MSET where type elem = Key.ord_key = struct open Array type elem = Key.ord_key type set = {elems: elem option array ref, size: int} (* ref {elems, size} represents a set containing the first size * elements in elems. * Rep invariant: the first size elements have the form SOME(e) * and they are in sorted order according to Key.compare. *) val initial_size = 10 val empty () = {elems = array(10, NONE), size = 0} ... end
The idea is to create an array that is large enough to hold all the elements
of the set. If too many add
's are done, a new array is created. Only the first
size elements of the array are actually used to store elements, and they are
stored in sorted order. The member
operation can be performed using binary
search with O(lg n) time. However, add
will not be as efficient, because insertion into the middle of the array will
take O(n) time. Note that adding the element at
the end of the array would take O(1) time but would
break the rep invariant on the set that is being extended.
A common mistake when designing a mutable abstraction is exposing the rep -- that is, implementing operations in a way that allows users access to mutable pieces of the representation. The problem is that these mutable values may then be updated by a thoughtless or malicious client, causing invalid changes to the original abstract value.
For example, suppose that we add an operation toArray: set -> elem
option array
to the mutable set interface. It looks very easy to implement
in the functor above:
fun toArray({elems,size}) = !elems
This implementation simply returns the array that is used as part of the
representation of the set. The problem is that a client receiving the array of
elements can change the array using update
, and in doing so break
the rep invariant of the set the array was received from.
In early versions of Java, there was actually a security hole based on improperly exposing the rep. An array was returned containing security-critical information in response to a query; this array could be modified by applet code, changing the system security policy!
When a routine has side effects or manipulates mutable data structures, another important issue to consider in the specification is aliasing. Aliasing occurs when two different variables refer to the same underlying mutable data, so changes through one variable affect the other one. For example, consider a function that copies elements from one set to another:
(* copy(s1,s2): add all the elements of s2 into the set s1 * Requires: s1 and s2 are different sets. *) val copy: set * set -> unit
This function might easily be implemented in a way that causes it not to work when the inputs are aliased and the function is trying to copy elements from the set to itself. There, in the specification for this function, we should specify whether the two sets are allowed to be the same set.
Aliasing is the source of many bugs, because often programmers do not think enough about the possibility of aliasing.
Not every side effect needs to be documented with an effects clause. If a side effect does not affect the abstract value that a mutable representation maps to, the side effect will be invisible to the user of the abstraction. Therefore, it need not be mentioned in the specification. Side effects of this sort as known as benign side effects, because they are not destructive. Benign side effects can be useful for building caches internal to data structures, or for data structure reorganizations that improve performance without affecting their abstract value.