All of the data structures we have covered in this course so far are functional. You can access old versions of these data structure, even when you update them. For example, you can cons an element onto a list without affecting references to the original list.
We have seen efficient functional implementations for stacks, queues, and binary trees. Can we come up with an efficient implementation of functional arrays, too? It turns out that we can, but we will need to use imperative language features. Our array will be implemented using an OCaml array for efficiency, but we need to be careful that updates to the array do not affect previous versions.
Let's define a signature:
module type FUN_ARRAY = sig type 'a t (* make n x creates an array of length n such that each element is x *) val make : int -> 'a -> 'a t (* get a i returns element i in the array a. Requires: 0 <= i < length a *) val get : 'a t -> int -> 'a (* set a i x is the array a with x inserted at index i Requires: 0 <= i < length a *) val set : 'a t -> int -> 'a -> 'a t end
Let's define the set
function so that it makes a copy
of the array it was given, then updates the copy.
module FunArray : FUN_ARRAY = struct type 'a t = 'a array let make = Array.make let get = Array.get let set (a : 'a t) (i : int) (x : 'a) : 'a t = let a' = Array.copy a in a'.(i) <- x; a' endThis array is purely functional, but it's slow, because we make a new array every time call
set
.
Say we execute FunArray.set a i x
. Let's imperatively
set a.(i) <- x
. Then, let's update all of the old
references to a
so that they point to the old version of
a
.
How are we going to update the old references to a
?
One idea is to make 'a t
a ref
. Every time
FunArray.set
is called, we change the value stored at
a
so that it has the same mappings as the original
array. Thus the old references to a
will still refer to
the old version of a
, even though the underlying array
is different.
type 'a t = 'a data ref and 'a data = Arr of 'a array | Diff of int * 'a * 'a t
An array can either be an Arr(arr)
, which has the
same mappings as the OCaml array arr
, or it can be a
Diff(i, x, a')
, which has all of the same mappings as the
functional array a', exception with the mapping i -> x
.
We then define the type 'a t
so that it is a
ref
to a value of type data
.
Getting a value from such an array is simple.
let rec get (a : 'a t) (i : int) : 'a = match !a with Arr arr -> arr.(i) | Diff(j, x, a') -> if i = j then x else get a' iIf the array points to an
Arr(arr)
, then we just look up
the given index in arr
. If it points to a Diff(j,
x, a')
, then we see if j
is the index we are
searching for. If it is, then we return x
, the value it
maps to.
Setting a value is more complicated
let set (a : 'a t) (i : int) (x : 'a) : 'a t = match !a with Arr arr -> let a' = ref (Arr arr) in a := Diff(i, arr.(i), a'); arr.(i) <- x; a' | Diff _ -> ref (Diff(i, x, a))
If a
points to a Diff
, then we return a
reference to a new Diff
which maps i
to
x
.
If a
points to an Arr(arr)
, then we
imperatively set arr.(i) <- x
. This operation invalidates
the old references to a
. These old references to
a
ought to refer to the old version of a
before the update. To get around this problem, we add a level of
indirection. We update the value stored at a
so that it
contains a Diff
preserving the old mapping of
i
.
Diff
nodes before
you reach the Arr
. There will be one of these
Diff
nodes per update.
We can improve performance when a user stops using the newest
version of the array and starts using an old version instead (for
example, in a backtracking algorithm). Every time a user sets or gets
an element of an array a
, we reverse the list of
Diff
nodes on the path to the Arr
so that
a
points directly to the Arr
. This operation
is known as rerooting.
(* Effects: reverses the list of Diff nodes along the path to the Arr *) let rec reroot (a : 'a t) : unit = match !a with Arr _ -> () | Diff(i, x, a') -> reroot a'; match !a' with Diff _ -> failwith "impossible" | Arr(arr) -> a := Arr(arr); a' := Diff(i, arr.(i), a); arr.(i) <- x
Every time the user calls get
or set
, we
reroot the tree. Note that this implementation is still inefficient
for some use cases. If a user frequently accesses many different
versions of an array, then rerooting will occur every time, and
performance will suffer. Another problem is that our definition of
reroot
is not tail recursive (it is possible to fix this
problem).
For more details, please see this paper on which this recitation is based. They use functional arrays to implement a functional disjoint set, a data structure that we will cover later in the course.
Here is our final implementation.
module FunArray : FUN_ARRAY = struct type 'a t = 'a data ref and 'a data = Arr of 'a array | Diff of int * 'a * 'a t let make (n : int) (x : 'a) : 'a t = ref (Arr (Array.make n x)) (* Effects: reverses the list of Diff nodes along the path to the Arr *) let rec reroot (a : 'a t) : unit = match !a with Arr _ -> () | Diff(i, x, a') -> reroot a'; match !a' with Diff _ -> failwith "impossible" | Arr(arr) -> a := Arr(arr); a' := Diff(i, arr.(i), a); arr.(i) <- x let rec get (a : 'a t) (i : int) : 'a = reroot a; match !a with Diff(j, x, a') -> failwith "impossible" | Arr arr -> arr.(i) let rec set (a : 'a t) (i : int) (x : 'a) : 'a t = reroot a; match !a with Arr arr -> let a' = ref (Arr arr) in a := Diff(i, arr.(i), a'); arr.(i) <- x; a' | Diff _ -> ref (Diff(i, x, a)) end