CS 312 Lecture 13
Assignment and Side Effects

Thus far, we've been working with a functional subset of SML.  That is, we've been working with the parts of the language that do not include computational effects (or side effects) other than printing and exceptions.  In particular, whenever we coded a function, we never changed variables or data.  Rather, we always computed new data.  For instance, when we wrote code for an abstract data type such as a stack, queue, or tree, the operations to insert an item into the data structure didn't affect the existing copy of the data structure.  Instead, we always built a new data structure with the item in the appropriately place.  (Note that the new data structure might refer to the old data structure, so this isn't always as memory inefficient as it first sounds.)

For the most part, coding in a functional style (i.e., without side effects) is a "good thing" because it's easier to reason locally about the behavior of the code.  For instance, when we code purely functional queues or stacks, we don't have to worry about a non-local change to a queue or stack.  However, it is often more efficient or clearer to destructively modify a data structure than to build a new version.  In these situations, we need some form of mutable data structures.

Like most imperative programming languages, SML provides support for mutable data structures, but unlike languages such as C, C++, or Java, they are not the default.  Thus, programmers are encouraged to code purely functionally by default and to only resort to mutable data structures when absolutely necessary.  In addition, unlike most imperative languages, SML provides no support for mutable variables.  In other words, the value of a variable cannot change in SML.  Rather, all mutations must occur through data structures.

Refs

There are only two built-in mutable data structures in SML:  refs and arrays. Today we will talk about refs. SML supports imperative programming through the primitive parameterized ref type. A value of type "int ref" is a pointer to a location in memory, where the location in memory contains an integer.  It's analogous to "int*" in C/C++ or "Integer" in Java (but not "int" in Java).   Like lists, refs are polymorphic, so in fact, we can have a ref (i.e., pointer) to a value of any type. 

A partial signature for refs is below:

  signature REF = 
    sig
      type 'a ref

      (* ref(x) creates a new ref containing x *)
      val ref : 'a -> 'a ref     

      (* !x is the contents of the ref cell x *)
      val op ! : 'a ref -> 'a     

      (* Effects: x := y updates the contents of x
       * so it contains y. *)
      val op := : 'a ref * 'a -> unit
    end

A ref is like a box that can store a single value. By using the := operator, the value in the box can be changed as a side effect.  That is, the value of := is not what matters, it is the effect that it has.  The type of := is 'a ref * 'a -> unit. This type clarifies that the assignment operator is only interesting for its side effects, as it computes unit, a value that we know a priori. Expression rv := e evaluates expression e, stores its value v in memory, then stores a reference to v in rv. It is important to distinguish between the value that is stored in the box, and the box itself. A ref is the simplest mutable data structure. A mutable data structure is one that be changed imperatively, or mutated.

We can use box diagrams to represent references and the values they point to. A box with an R in it will represent a reference. Taking a reference will be equivalent to creating a reference box with an outgoing arrow that points to the value referred to. Given this convention, the ! operator corresponds to following the outgoing arrow of a reference box to the value it points to. The assignment operator := changes the outgoing arrow of a reference box by making it point to the new value. Here are some examples:

            +---+          +---+
 x -------> |(R)| -------> | 7 |     val x = ref 7
            +---+          +---+


            +---+          +---+     (* x, y point to SAME BOX *)
 x -------> |(R)| -------> | 7 |     val y = x
            +---+          +---+
              ^
              |
 y -----------+


            +---+          +---+     (changing the value of x changes y's value also)
 x -------> |(R)| ---+     | 7 |     x := 3
            +---+    |     +---+
              ^      |
              |      |     +---+
 y -----------+      +---> | 5 |
                           +---+

The following code shows a simple example where we use  ref:

    let val x : int ref = ref 3
        val y : int = !x
    in
        x := (!x) + 1;
        y + (!x)
    end

The code above evaluates to 7.  Let's see why:  The first line "val x:int ref = ref 3" creates a new ref cell, initializes the contents to 3, and then returns a reference (i.e., pointer) to the cell and binds it to x.  The second line "val y:int = !x" reads the contents of the cell referenced by x, returns 3, and then binds it to y.  The third line "x := (!x) + 1;" evaluates "!x" to get 3, adds one to it to get 4, and then sets the contents of the cell referenced by x to this value.  The fourth line "y + (!x)" returns the sum of the values y (i.e., 3) and the contents of the cell referenced by x (4).  Thus, the whole expression evaluates to 7.  Note, however, that if we made y=x rather than !x, then as in the example above they would refer to the same "location" and the value would be 8:

    let val x : int ref = ref 3
        val y : int ref = x
    in
        x := (!x) + 1;
        (!y) + (!x)
    end

Local State: Using refs to �package� state in closures

Consider the following use of refs:

 
datatype Action = Deposit | Withdrawl
 
fun mkacct(initial:int):(Action*int)->int =
  let val balance:int ref = ref initial
    in 
      fn(a:Action,x:int) =>
      (print("Balance was " ^ Int.toString(!balance) ^"\n");
       (case a of 
         Deposit => balance := !balance + x
       | Withdrawl => if (!balance < x) 
                       then raise Fail "Overdrawn!"
                      else balance := !balance - x);
          print("New balance: " ^ Int.toString(!balance) ^"\n");
          !balance)
  end
 

This produces the following output

- val acct2 = mkacct 100;
val acct2 = fn : Action * int -> int
- val acct1 = mkacct 100;
val acct1 = fn : Action * int -> int
- acct1(Deposit,10);
Balance was 100
New balance: 110
val it = 110 : int
- acct2(Withdrawl,10);
Balance was 100
New balance: 90
val it = 90 : int

Note the loss of referential transparency.  That is, we no longer know that foo(x) always produces the same value for a given x.  There is state that causes two succes calls acct1(Deposit,10) to return different values.  However now we can create separate "objects" acct1, acct2 that are independent, keeping their state packaged up.  In contrast, in a purely functional style, we needed to always have a function return the "object" because it can never be changed.

This kind of packaging up of state information by creating a higher order function that knows how to handle certain operations to change or return the state should be reminiscent of something you are familiar with in other languages.  Where have you seen the idea of an object that handles certain operations?  Object-oriented programming (without the inheritance part).  In effect, simple object oriented programming can be implemented using higher order procedures, and local state variables.

This packaging up of a function with local data is called a closure.  We will return to this in more detail when we talk about the environment model for reasoning about programs with side effects.

When writing code with mutable data (using the assignment operator), it is best to localize the variables that will be modified.  This is often referred to as local state.  The above code is a simple example of using local state.  While balance is a ref variable, and can thus be modified, it is not exposed outside the function that is returned by mkacct. Thus it is not possible for arbitrary code to modify balance, only the code that is local to mkacct(more precisely, the function returned by mkacct).  Some of the most obscure bugs in programs come from code that modifies some variable in an unexpected way.  Thus, whenever programming with mutable data, in whatever language, it is a good idea to use the facilities of the language to "hide" or "package up" the mutable variables such that they can only be modified locally.  This is an IMPORTANT programming practice.

Mutable Data Abstractions

Here's an example of a mutable stack build using refs:

    signature MUTABLE_STACK = 
      sig
	  (* An 'a mstack is a mutable stack of 'a elements *)
         type 'a mstack
         (* new(x) is a new stack with one item, x *)
         val new : 'a -> 'a mstack
         (* Effects: push(m,x) pushes x onto m *)
         val push : 'a mstack * 'a -> unit
         (* pop(m) is the head of m.
          * Effects: pops the head off the stack. *)
         val pop : 'a mstack -> 'a option
      end

    structure MStack :> MUTABLE_STACK =
      struct
         (* A mutable stack is a reference
          * to the list of values, with the top
          * of the stack at the head. *)
         type 'a mstack = ('a list) ref
         fun new(x:'a):'a mstack = ref([x])
         fun push(s:'a mstack, x:'a):unit = 
             s := x::(!s)
         fun pop(s:'a stack):'a option = 
             case (!s) of
               [] => NONE
             | hd::tl => (s := tl; SOME(hd))
       end

Note the use of "effects" clauses to document the effect that various operations have, when the operations are used for effect instead of or in addition to value.  Push is used for value only (returns value of type unit), whereas pop is used both for effect and for value.

A good exercise is to consider mutable versus immutable (functional) versions of queues, priority queues, balanced trees, or any other data structure that we've seen in class thus far.