Functional Arrays

All of the data structures we have covered in this course so far are functional. You can access old versions of these data structure, even when you update them. For example, you can cons an element onto a list without affecting references to the original list.

We have seen efficient functional implementations for stacks, queues, and binary trees. Can we come up with an efficient implementation of functional arrays, too? It turns out that we can, but we will need to use imperative language features. Our array will be implemented using an OCaml array for efficiency, but we need to be careful that updates to the array do not affect previous versions.

First attempt

Let's define a signature:

module type FUN_ARRAY = sig
  type 'a t

  (* make n x creates an array of length n such that each element is x *)
  val make : int -> 'a -> 'a t
  (* get a i returns element i in the array a.
     Requires: 0 <= i < length a *)
  val get : 'a t -> int -> 'a

  (* set a i x is the array a with x inserted at index i
     Requires: 0 <= i < length a *)
  val set : 'a t -> int -> 'a -> 'a t
end

Let's define the set function so that it makes a copy of the array it was given, then updates the copy.

module FunArray : FUN_ARRAY = struct
  type 'a t = 'a array
  let make = Array.make 
  let get = Array.get
  let set (a : 'a t) (i : int) (x : 'a) : 'a t =
    let a' = Array.copy a in
      a'.(i) <- x;
      a'
end
This array is purely functional, but it's slow, because we make a new array every time call set.

Second attempt

Say we execute FunArray.set a i x. Let's imperatively set a.(i) <- x. Then, let's update all of the old references to a so that they point to the old version of a.

How are we going to update the old references to a? One idea is to make 'a t a ref. Every time FunArray.set is called, we change the value stored at a so that it has the same mappings as the original array. Thus the old references to a will still refer to the old version of a, even though the underlying array is different.

type 'a t = 'a data ref
and 'a data = Arr of 'a array | Diff of int * 'a * 'a t

An array can either be an Arr(arr), which has the same mappings as the OCaml array arr, or it can be a Diff(i, x, a'), which has all of the same mappings as the functional array a', exception with the mapping i -> x. We then define the type 'a t so that it is a ref to a value of type data.

Getting a value from such an array is simple.

let rec get (a : 'a t) (i : int) : 'a =
    match !a with
        Arr arr -> arr.(i)
      | Diff(j, x, a') -> if i = j then x else get a' i
If the array points to an Arr(arr), then we just look up the given index in arr. If it points to a Diff(j, x, a'), then we see if j is the index we are searching for. If it is, then we return x, the value it maps to.

Setting a value is more complicated

let set (a : 'a t) (i : int) (x : 'a) : 'a t =
    match !a with
        Arr arr ->
          let a' = ref (Arr arr) in
            a := Diff(i, arr.(i), a');
            arr.(i) <- x;
            a'
      | Diff _ -> ref (Diff(i, x, a))

If a points to a Diff, then we return a reference to a new Diff which maps i to x.

If a points to an Arr(arr), then we imperatively set arr.(i) <- x. This operation invalidates the old references to a. These old references to a ought to refer to the old version of a before the update. To get around this problem, we add a level of indirection. We update the value stored at a so that it contains a Diff preserving the old mapping of i.

Rerooting

Our implementation is efficient for accessing the most recent version of an array. However, accessing old versions of the array are slow, because you have to traverse a list of Diff nodes before you reach the Arr. There will be one of these Diff nodes per update.

We can improve performance when a user stops using the newest version of the array and starts using an old version instead (for example, in a backtracking algorithm). Every time a user sets or gets an element of an array a, we reverse the list of Diff nodes on the path to the Arr so that a points directly to the Arr. This operation is known as rerooting.

  (* Effects: reverses the list of Diff nodes along the path to the Arr *)
  let rec reroot (a : 'a t) : unit =
    match !a with
        Arr _ -> ()
      | Diff(i, x, a') ->
          reroot a';
          match !a' with
              Diff _ -> failwith "impossible"
            | Arr(arr) ->
                a := Arr(arr);
                a' := Diff(i, arr.(i), a);
                arr.(i) <- x

Every time the user calls get or set, we reroot the tree. Note that this implementation is still inefficient for some use cases. If a user frequently accesses many different versions of an array, then rerooting will occur every time, and performance will suffer. Another problem is that our definition of reroot is not tail recursive (it is possible to fix this problem).

For more details, please see this paper on which this recitation is based. They use functional arrays to implement a functional disjoint set, a data structure that we will cover later in the course.

Here is our final implementation.

module FunArray : FUN_ARRAY = struct
  type 'a t = 'a data ref
  and 'a data = Arr of 'a array | Diff of int * 'a * 'a t

  let make (n : int) (x : 'a) : 'a t =
    ref (Arr (Array.make n x))

  (* Effects: reverses the list of Diff nodes along the path to the Arr *)
  let rec reroot (a : 'a t) : unit =
    match !a with
        Arr _ -> ()
      | Diff(i, x, a') ->
          reroot a';
          match !a' with
              Diff _ -> failwith "impossible"
            | Arr(arr) ->
                a := Arr(arr);
                a' := Diff(i, arr.(i), a);
                arr.(i) <- x
                  
  let rec get (a : 'a t) (i : int) : 'a =
    reroot a;
    match !a with
        Diff(j, x, a') -> failwith "impossible"
      | Arr arr -> arr.(i)

  let rec set (a : 'a t) (i : int) (x : 'a) : 'a t =
    reroot a;
    match !a with
        Arr arr ->
          let a' = ref (Arr arr) in
            a := Diff(i, arr.(i), a');
            arr.(i) <- x;
            a'
      | Diff _ -> ref (Diff(i, x, a))
end