# Mutable Data Types
* * *
Topics:
* refs
* mutable fields
* arrays
* mutable data structures
* * *
## Mutable Data Types
OCaml is not a *pure* language: it does admit side effects. We have
seen that already with I/O, especially printing. But up till now we
have limited ourself to the subset of the language that is *immutable*:
values could not change. Today, we look at data types that are
mutable.
Mutability is neither good nor bad. It enables new functionality that
we couldn't implement (at least not easily) before, and it enables
us to create certain data structures that are asymptotically more
efficient than their purely functional analogues. But mutability
does make code more difficult to reason about, hence it is a source
of many faults in code. One reason for that might be that humans
are not good at thinking about change. With immutable values,
we're guaranteed that any fact we might establish about them
can never change. But with mutable values, that's no longer true.
"Change is hard," as they say.
## Refs
A *ref* is like a pointer or reference in an imperative language.
It is a location in memory whose contents may change. Refs
are also called *ref cells*, the idea being that there's a cell
in memory that can change.
**A first example.**
Here's an example utop transcript to introduce refs:
```
# let x = ref 0;;
val x : int ref = {contents = 0}
# !x;;
- : int = 0
# x := 1;;
- : unit = ()
# !x;;
- : int = 1
```
At a high level, what that shows is creating a ref, getting the value from inside it,
changing its contents, and observing the changed contents. Let's dig a little deeper.
The first phrase, `let x = ref 0`, creates a reference using the `ref` keyword.
That's a location in memory whose contents are initialized to `0`. Think of the
location itself as being an address—for example, 0x3110bae0—even though
there's no way to write down such an address in an OCaml program. The keyword
`ref` is what causes the memory location to be allocated and initialized.
The first part of the response from utop, `val x : int ref`, indicates
that `x` is a variable whose type is `int ref`. We have a new type
constructor here. Much like `list` and `option` are type constructors,
so is `ref`. A `t ref`, for any type `t`, is a reference to a memory
location that is guaranteed to contain a value of type `t`. As usual
we should read should a type from right to left: `t ref` means a
reference to a `t`.
The second part of the response shows us the contents of the memory
location. Indeed, the contents have been initialized to `0`.
The second phrase, `!x`, dereferences `x` and returns the contents
of the memory location. Note that `!` is the dereference operator
in OCaml, not Boolean negation.
The third phrase, `x := 1`, is an assignment. It mutates the contents
`x` to be `1`. Note that `x` itself still points to the same location
(i.e., address) in memory. Variables really are immutable in that way.
What changes is the contents of that memory location. Memory is
mutable; variable bindings are not. The response from utop is simply
`()`, meaning that the assignment took place—much like printing
functions return `()` to indicate that the printing did happen.
The fourth phrase, `!x` again dereferences `x` to demonstrate that
the contents of the memory location did indeed change.
**A more sophisticated example.**
Here is code that implements a *counter*. Every time `next_val`
is called, it returns one more than the previous time.
```
# let counter = ref 0;;
val counter : int ref = {contents = 0}
# let next_val =
fun () ->
counter := (!counter) + 1;
!counter;;
val next_val : unit -> int =
# next_val();;
- : int = 1
# next_val();;
- : int = 2
# next_val();;
- : int = 3
```
In the implementation of `next_val`, there are two expressions
separated by semi-colon. The first expression, `counter := (!counter) + 1`,
is an assignment that increments `counter` by 1. The second
expression, `!counter`, returns the newly incremented contents
of `counter`.
This function is unusual in that every time we call it, it returns
a different value. That's quite different than any of the functions
we've implemented ourselves so far, which have always been
*deterministic*: for a given input, they always produced the same output.
On the other hand, we've seen some library functions that
are *nondeterministic*, for example, functions in the `Random` module,
and `Pervasives.read_line`. It's no coincidence that those happen to be
implemented using mutable features.
We could improve our counter in a couple ways. First, there is a
library function `incr : int ref -> unit` that increments an `int ref`
by 1. Thus it is like the `++` operator in many language in the
C family. Using it, we could write `incr counter` instead of
`counter := (!counter) + 1`.
Second, the way we coded the counter currently exposes the `counter`
variable to the outside world. Maybe we're prefer to hide it so
that clients of `next_val` can't directly change it. We could
do so by nesting `counter` inside the scope of `next_val`:
```
let next_val =
let counter = ref 0
in fun () ->
incr counter;
!counter
```
Now `counter` is in scope inside of `next_val`, but not accessible
outside that scope.
When we gave the dynamic semantics of let expressions before,
we talked about substitution. One way to think about the definition
of `next_val` is as follows.
* First, the expression `ref 0` is evaluated. That returns a location
`loc`, which is an address in memory. The contents of that address
are initialized to `0`.
* Second, everywhere in the body of the let expression that `counter`
occurs, we substitute for it that location. So we get:
```
fun () -> incr loc; !loc
```
* Third, that anonymous function is bound to `next_val`.
So any time `next_val` is called, it increments and returns the contents
of that one memory location `loc`.
Now imagine that we instead had written the following (broken) code:
```
let next_val_broken = fun () ->
let counter = ref 0
in incr counter;
!counter
```
It's only a little different: the binding of `counter` occurs after
the `fun () ->` instead of before. But it makes a huge difference:
```
# next_val_broken ();;
- : int = 1
# next_val_broken ();;
- : int = 1
# next_val_broken ();;
- : int = 1
```
Every time we call `next_val_broken`, it returns `1`: we no longer
have a counter. What's going wrong here?
The problem is that every time `next_val_broken` is called, the first
thing it does is to evaluate `ref 0` to a new location that is initialized
to `0`. That location is then incremented to `1`, and `1` is returned.
Every call to `next_val_broken` is thus allocating a new ref cell, whereas
`next_val` allocates just one new ref cell.
**Syntax.** The first three of the following are new syntactic forms involving refs,
and the last is a syntactic form that we haven't yet fully explored.
* Ref creation: `ref e`
* Ref assignment: `e1 := e2`
* Dereference: `!e`
* Sequencing of effects: `e1; e2`
**Dynamic semantics.**
* To evaluate `ref e`,
- Evaluate `e` to a value `v`
- Allocate a new location `loc` in memory to hold `v`
- Store `v` in `loc`
- Return `loc`
* To evaluate `e1 := e2`,
- Evaluate `e2` to a value `v`, and `e1` to a location `loc`.
- Store `v` in `loc`.
- Return `()`, i.e., unit.
* To evaluate `!e`,
- Evaluate `e` to a location `loc`.
- Return the contents of `loc`.
* To evaluate `e1; e2`,
- First evaluate `e1` to a value `v1`.
- Then evaluate `e2` to a value `v2`.
- Return `v2`. (`v1` is not used at all.)
- If there are multiple expressions in a sequence, e.g., `e1; e2; ...; en`,
then evaluate each one in order from left to right, returning only `vn`.
Another way to think about this is that semi-colon is right associative—for
example `e1; e2; e3` is the same as `e1; (e2; e3))`.
Note that locations are values that can be passed to and returned from functions.
But unlike other values (e.g., integers, variants), there is no way to directly
write a location in an OCaml program. That's different than languages like C,
where programmers can directly write memory addresses and do arithmetic on pointers.
C programmers want that kind of low-level access to do things like interface with
hardware and build operating systems. Higher-level programmers are willing to
forego it to get *memory safety*. That's a hard term to define,
but according to [Hicks 2014][memory-safety-hicks] it intuitively means that
* pointers are only created in a safe way that defines their legal memory region,
* pointers can only be dereferenced if they point to their allotted memory region,
* that region is (still) defined.
[memory-safety-hicks]: http://www.pl-enthusiast.net/2014/07/21/memory-safety/
**Static semantics.** We have a new type constructor, `ref`, such that
`t ref` is a type for any type `t`. Note that the `ref` keyword is used
in two ways: as a type constructor, and as an expression that constructs refs.
* `ref e : t ref` if `e : t`.
* `e1 := e2 : unit` if `e1 : t ref` and `e2 : t`.
* `!e : t` if `e : t ref`.
* `e1; e2 : t` if `e1 : unit` and `e2 : t`. Similarly, `e1; e2; ...; en : t`
if `e1 : unit`, `e2 : unit`, ... (i.e., all expressions except `en` have type `unit`),
and `en : t`.
The typing rule for semi-colon is designed to prevent programmer mistakes. For
example, a programmer who writes `2+3; 7` probably didn't mean to: there's
no reason to evaluate `2+3` then throw away the result and instead return `7`.
The compiler will give you a warning if you violate this particular typing rule.
To get rid of the warning (if you're sure that's what you need to do),
there's a function `ignore : 'a -> unit` in the standard library.
Using it, `ignore(2+3); 7` will compile without a warning. Of course,
you could code up `ignore` yourself: `let ignore _ = ()`.
**Aliasing.** Now that we have refs, we have *aliasing*: two refs could point to the
same memory location, hence updating through one causes the other to also be updated.
For example,
```
let x = ref 42
let y = ref 42
let z = x
let () = x := 43
let w = (!y) + (!z)
```
The result of executing that code is that `w` is bound to `85`, because `let z = x`
causes `z` and `x` to become aliases, hence updating `x` to be `43` also causes `z`
to be `43`.
**Equality.** OCaml has two equality operators, physical equality and structural
equality. The [documentation][pervasives] of `Pervasives.(==)` explains physical equality:
> `e1 == e2` tests for physical equality of `e1` and `e2`. On mutable types such as
> references, arrays, byte sequences, records with mutable fields and objects with
> mutable instance variables, `e1 == e2` is `true` if and only if physical modification
> of `e1` also affects `e2`. On non-mutable types, the behavior of `( == )` is
> implementation-dependent; however, it is guaranteed that `e1 == e2` implies
> `compare e1 e2 = 0`.
[pervasives]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Pervasives.html
One interpretation could be that `==` should be used only when comparing refs
(and other mutable data types) to see whether they point to the same location in
memory. Otherwise, don't use `==`.
Structural equality is also explained in the documentation of `Pervasives.(=)`:
> `e1 = e2` tests for structural equality of `e1` and `e2`. Mutable structures
> (e.g. references and arrays) are equal if and only if their current contents
> are structurally equal, even if the two mutable objects are not the same
> physical object. Equality between functional values raises `Invalid_argument`.
> Equality between cyclic data structures may not terminate.
Structural equality is usually what you want to test. For refs, it checks whether
the contents of the memory location are equal, regardless of whether they are the
same location.
The negation of physical equality is `!=`, and the negation of structural
equality is `<>`. This can be hard to remember.
Here are some examples involving equality and refs to illustrate the difference
between structural equality (`=`) and physical equality (`==`):
```
# let r1 = ref 3110;;
val r1 : int ref = {contents = 3110}
# let r2 = ref 3110;;
val r2 : int ref = {contents = 3110}
# r1 == r1;;
- : bool = true
# r1 == r2;;
- : bool = false
# r1 != r2;;
- : bool = true
# r1 = r1;;
- : bool = true
# r1 = r2;;
- : bool = true
# r1 <> r2;;
- : bool = false
# ref 3110 <> ref 2110;;
- : bool = true
```
## Mutable fields
The fields of a record can be declared as mutable, meaning their contents can be
updated without constructing a new record. For example, here is a record type
for two-dimensional colored points whose color field `c` is mutable:
```
# type point = {x:int; y:int; mutable c:string};;
type point = {x:int; y:int; mutable c:string; }
```
Note that `mutable` is a property of the field, rather than the type of the field.
In particular, we write `mutable field : type`, not `field : mutable type`.
The operator to update a mutable field is `<-`:
```
# let p = {x=0; y=0; c="red"};;
val p : point = {x=0; y=0; c="red"}
# p.c <- "white";;
- : unit = ()
# p;;
val p : point = {x=0; y=0; c="white"}
# p.x <- 3;;
Error: The record field x is not mutable
```
The syntax and semantics of `<-` is similar to `:=` but complicated by fields:
* **Syntax:** `e1.f <- e2`
* **Dynamic semantics:** To evaluate `e1.f <- e2`, evaluate `e2` to a value `v2`,
and `e1` to a value `v1`, which must have a field named `f`. Update `v1.f`
to `v2`. Return `()`.
* **Static semantics:** `e1.f <- e2 : unit` if `e1 : t1` and
`t1 = {...; mutable f : t2; ...}`, and `e2 : t2`.
## Refs and mutable fields
It turns out that refs are actually implemented as mutable fields. In
[`Pervasives`][pervasives] we find the following declaration:
```
type 'a ref = { mutable contents : 'a; }
```
And that's why when we create a ref it does in fact looks like a record:
it *is* a record!
```
# let r = ref 3110;;
val r : int ref = {contents = 3110}
```
The other syntax we've seen for records is in fact equivalent to simple OCaml functions:
```
(* Equivalent to [fun v -> {contents=e}]. *)
val ref : 'a -> 'a ref
(* Equivalent to [fun r -> r.contents]. *)
val (!) : 'a ref -> 'a
(* Equivalent to [fun r v -> r.contents <- v]. *)
val (:=) : 'a ref -> 'a -> unit
```
The reason we say "equivalent" is that those functions are actually
implemented not in OCaml but in the OCaml run-time, which is implemented
mostly in C. But the functions do behave the same as the OCaml source
given above in comments.
## Arrays
Arrays are fixed-length mutable sequences with constant-time access and
update. So they are similar in various ways to refs, lists, and tuples.
Like refs, they are mutable. Like lists, they are (finite) sequences.
Like tuples, their length is fixed in advance and cannot be resized.
The syntax for arrays is similar to lists:
```
# let v = [|0.; 1.|];;
val v : float array = [|0.; 1.|]
```
That code creates an array whose length is fixed to be 2 and whose
contents are initialized to `0.` and `1.`. The keyword `array`
is a type constructor, much like `list`.
Later those contents can be changed using the `<-` operator:
```
# v.(0) <- 5.;;
- : unit = ()
# v;;
- : float array = [|5.; 1.|]
```
As you can see in that example, indexing into an array uses the
syntax `array.(index)`, where the parentheses are mandatory.
The [`Array` module][array] has many useful functions on arrays.
[array]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Array.html
**Syntax.**
* Array creation: `[|e0; e1; ...; en|]`
* Array indexing: `e1.(e2)`
* Array assignment: `e1.(e2) <- e3`
**Dynamic semantics.**
* To evaluate `[|e0; e1; ...; en|]`, evaluate each `ei` to a value `vi`,
create a new array of length `n+1`, and store each value in the array
at its index.
* To evaluate `e1.(e2)`, evaluate `e1` to an array value `v1`, and
`e2` to an integer `v2`. If `v2` is not within the bounds of the
array (i.e., `0` to `n-1`, where `n` is the length of the array),
raise `Invalid_argument`. Otherwise, index into `v1` to
get the value `v` at index `v2`, and return `v`.
* To evaluate `e1.(e2) <- e3`, evaluate each expression `ei` to a value `vi`.
Check that `v2` is within bounds, as in the semantics of indexing.
Mutate the element of `v1` at index `v2` to be `v3`.
**Static semantics.**
* `[|e0; e1; ...; en|] : t array` if `ei : t` for all the `ei`.
* `e1.(e2) : t` if `e1 : t array` and `e2 : int`.
* `e1.(e2) <- e3 : unit` if `e1 : t array` and `e2 : int` and `e3 : t`.
**Loops.**
OCaml has while loops and for loops. Their
syntax is as follows:
```
while e1 do e2 done
for x=e1 to e2 do e3 done
for x=e1 downto e2 do e3 done
```
The second form of `for` loop counts down from `e1` to `e2`—that is,
it decrements its index variable at each iteration.
Though not mutable features themselves, loops can be useful with mutable
data types like arrays. We can also use functions like
`Array.iter`, `Array.map`, and `Array.fold_left` instead of loops.
## Mutable data structures
As an example of a mutable data structure, let's look at stacks. We're
already familiar with functional stacks:
```
exception Empty
module type Stack = sig
(* ['a t] is the type of stacks whose elements have type ['a]. *)
type 'a t
(* [empty] is the empty stack *)
val empty : 'a t
(* [push x s] is the stack whose top is [x] and the rest is [s]. *)
val push : 'a -> 'a t -> 'a t
(* [peek s] is the top element of [s].
* raises: [Empty] is [s] is empty. *)
val peek : 'a t -> 'a
(* [pop s] is all but the top element of [s].
* raises: [Empty] is [s] is empty. *)
val pop : 'a t -> 'a t
end
```
An interface for a *mutable* or *non-persistent* stack would look a
little different:
```
module type MutableStack = sig
(* ['a t] is the type of mutable stacks whose elements have type ['a].
* The stack is mutable not in the sense that its elements can
* be changed, but in the sense that it is not persistent:
* the operations [push] and [pop] destructively modify the stack. *)
type 'a t
(* [empty ()] is the empty stack *)
val empty : unit -> 'a t
(* [push x s] modifies [s] to make [x] its top element.
* The rest of the elements are unchanged. *)
val push : 'a -> 'a t -> unit
(* [peek s] is the top element of [s].
* raises: [Empty] is [s] is empty. *)
val peek : 'a t -> 'a
(* [pop s] removes the top element of [s].
* raises: [Empty] is [s] is empty. *)
val pop : 'a t -> unit
end
```
Notice especially how the type of `empty` changes: instead of being a
value, it is now a function. This is typical of functions that create
mutable data structures. Also notice how the types of `push` and `pop`
change: instead of returning an `'a t`, they return `unit`. This again
is typical of functions that modify mutable data structures. In all
these cases, the use of `unit` makes the functions more like their
equivalents in an imperative language. The constructor for an empty
stack in Java, for example, might not take any arguments (which is
equivalent to taking unit). And the push and pop functions for a Java
stack might return `void`, which is equivalent to returning `unit`.
Now let's implement the mutable stack with a mutable linked list.
We'll have to code that up ourselves, since OCaml linked lists
are persistent.
```
module MutableRecordStack = struct
(* An ['a node] is a node of a mutable linked list. It has
* a field [value] that contains the node's value, and
* a mutable field [next] that is [Null] if the node has
* no successor, or [Some n] if the successor is [n]. *)
type 'a node = {value : 'a; mutable next : 'a node option}
(* AF: An ['a t] is a stack represented by a mutable linked list.
* The mutable field [top] is the first node of the list,
* which is the top of the stack. The empty stack is represented
* by {top = None}. The node {top = Some n} represents the
* stack whose top is [n], and whose remaining elements are
* the successors of [n]. *)
type 'a t = {mutable top : 'a node option}
let empty () =
{top = None}
(* To push [x] onto [s], we allocate a new node with [Some {...}].
* Its successor is the old top of the stack, [s.top].
* The top of the stack is mutated to be the new node. *)
let push x s =
s.top <- Some {value = x; next = s.top}
let peek s =
match s.top with
| None -> raise Empty
| Some {value} -> value
(* To pop [s], we mutate the top of the stack to become its successor. *)
let pop s =
match s.top with
| None -> raise Empty
| Some {next} -> s.top <- next
end
```
Here is some example usage of the mutable stack:
```
# let s = empty ();;
val s : '_a t = {top = None}
# push 1 s;;
- : unit = ()
# s;;
- : int t = {top = Some {value = 1; next = None}}
# push 2 s;;
- : unit = ()
# s;;
- : int t = {top = Some {value = 2; next = Some {value = 1; next = None}}}
# pop s;;
- : unit = ()
# s;;
- : int t = {top = Some {value = 1; next = None}}
```
The `'_a` in the first utop response in that transcript is a
*weakly polymorphic type variable.* It indicates that the
type of elements of `s` is not yet fixed, but that as soon as
one element is added, the type (for that particular stack)
will forever be fixed. Weak type variables tend to appear
once mutability is involved, and they are important for the type
system to prevent certain kinds of errors, but we won't discuss
them further.
## Summary
We cover mutable data types in the "Advanced Data Structures" section of
this course because they are, in fact, harder to reason about. For
example, before refs, we didn't have to worry about aliasing in OCaml.
But mutability does have its uses. I/O is fundamentally about mutation.
And some data structures (like arrays, which we saw here, and
hash tables) cannot be implemented as efficiently without mutability.
Mutability thus offers great power, but with great power comes great
responsibility. Try not to abuse your new-found power!
## Terms and concepts
* address
* alias
* array
* assignment
* dereference
* determinstic
* immutable
* index
* loop
* memory safety
* mutable
* mutable field
* nondeterministic
* persistent
* physical equality
* pointer
* pure
* ref
* ref cell
* reference
* sequencing
* structural equality
## Further reading
* *Introduction to Objective Caml*, chapters 7 and 8
* *OCaml from the Very Beginning*, chapter 13
* *Real World OCaml*, chapter 8
* [*Relaxing the value restriction*][relaxing], by Jacques Garrigue, explains
more about weak type variables. Section 2 is a succinct explanation
of why they are needed.
[relaxing]: https://caml.inria.fr/pub/papers/garrigue-value_restriction-fiwflp04.pdf