Functors

Topics:

sets represented with lists

code reuse

includes

functors

the standard library Map module

In the previous lecture we began studying the OCaml module system, and we saw how it supports namespaces and abstraction. Today we'll look at how it supports code reuse, so that code need not be copied or rewritten.

Sets

As a running example, let's consider a data structure for representing sets:

module type Set = sig
  type 'a t

  (* [empty] is the empty set *)
  val empty : 'a t

  (* [mem x s] holds iff [x] is an element of [s] *)
  val mem   : 'a -> 'a t -> bool

  (* [add x s] is the set [s] unioned with the set containing exactly [x] *)
  val add   : 'a -> 'a t -> 'a t

  (* [elts s] is a list containing the elements of [s].  No guarantee
   * is made about the ordering of that list. *)
  val elts  : 'a t -> 'a list
end

There are many other operations a set data structure might be expected to support, but these will suffice for now.

Here's an implementation of that interface:

module ListSetNoDups : Set = struct
  type 'a t   = 'a list
  let empty   = []
  let mem     = List.mem
  let add x s = if mem x s then s else x::s
  let elts s  = s
end

Note how add ensures that the representation never contains any duplicates, so the implementation of elts is quite easy. Of course, that makes the implementation of add linear time, which is not ideal. But if we want high-performance sets, lists are not the right representation type anyway; there are much better data structures for sets, and you might see some later in this course or in an algorithms course.

Here's a second implementation:

module ListSetDups : Set = struct
  type 'a t   = 'a list
  let empty   = []
  let mem     = List.mem
  let add x s = x::s
  let elts s  = List.sort_uniq Pervasives.compare s
end

In that implementation, the add operation is now constant time, and the elts operation is logarithmic time. For some (most?) workloads, this is a better choice than ListSetNoDups, especially if add is likely to be called more often than elts, and if sets aren't expected to contain many duplicates.

Includes

Suppose we wanted to add a function of_list : 'a list -> 'a t to the ListSetDups module that could construct a set out of a list. If we had access to the source code of both ListSetDups and Set, and if we were permitted to modify it, this wouldn't be hard. But what if they were third-party libraries for which we didn't have source code?

In CS 2110, you will have learned about extending classes and inheriting methods of a superclass. Those object-oriented language features provide (among many other things) the ability to reuse code. For example, a subclass includes all the methods of its superclasses, though some might by overridden.

OCaml provides a language features called includes that also enables code reuse. This feature is similar to the object-oriented example we just gave: it enables a structure to include all the values defined by another structure, or a signature to include all the names declared by another signature.

We can use includes to solve the problem of adding of_list to ListSetDups:

module ListSetDupsExtended = struct
  include ListSetDups
  let of_list lst = List.fold_right add lst empty
end

This code says that ListSetDupsExtended is a structure containing all the definitions of the ListSetDups structure, as well as a definition of of_list. We don't have to know the source code implementing ListSetDups to make this happen. (You might wonder we why can't simply write let of_list lst = lst. See the section on the semantics of includes, below, for the answer.)

If we want to provide a new implementation of one of the included functions we could do that too:

module ListSetDupsExtended = struct
  include ListSetDups
  let of_list lst = List.fold_right add lst empty
  let rec elts = function
    | [] -> []
    | h::t -> if mem h t then elts' t else h::(elts' t)
end

But that's actually a less efficient implementation of elts, so we probably shouldn't do that for real.

One misconception to watch out for is that the above example does not replace the original implementation of elts. If any code inside ListSetDups called that original implementation, it still would in ListSetDupsExtended. Why? Remember the semantics of modules: all definitions are evaluated from top to bottom, in order. So the new definition of elts above won't come into use until the very end of evaluation. This differs from what you might expect from object-oriented languages like Java, which use a language feature called dynamic dispatch to figure out which implementation to invoke.

Semantics of include

Includes can be used inside of structures and inside of signatures. Of course, when we include inside a signature, we must be including another signature. And when we include inside a structure, we must be including another structure.

Including a structure. Including a structure is like writing a local definition for each name defined in the module. Writing include ListSetDups as did above, for example, has an effect similar to writing exactly the following:

module ListSetDupsExtended = struct
  (* BEGIN all the includes *)
  type 'a t = 'a ListSetDups.t
  let empty = ListSetDups.empty
  let mem   = ListSetDups.mem
  let add   = ListSetDups.add
  let elts  = ListSetDups.elts
  (* END all the includes *)
  let of_list lst = List.fold_right add lst empty
end

But if the set of names defined inside ListSetDups ever changed, the include would reflect that change, whereas the static code we wrote above would not.

Encapsulation and includes. We mentioned above that you might wonder why we didn't write this simpler definition of of_list:

  let of_list lst = lst

The reason is that includes must obey encapsulation, just like the rest of the module system. ListSetDups was sealed with the module type Set, thus making 'a t abstract. So even ListSetDupsExtended is forbidden from knowing that 'a t and 'a list are synonyms.

A standard way to solve this problem is to rewrite the definitions as folllows:

module ListSetDupsImpl = struct
  type 'a t   = 'a list
  let empty   = []
  let mem     = List.mem
  let add x s = x::s
  let elts s  = List.sort_uniq Pervasives.compare s
end

module ListSetDups : Set = ListSetDupsImpl

module ListSetDupsExtended = struct
  include ListSetDupsImpl
  let of_list lst = lst
end

The important change is that ListSetDupsImpl is not sealed, so its type 'a t is not abstract. When we include it in ListSetDupsExtended, we can therefore exploit the fact that it's a synonym for 'a list.

What we just did is effectively the same as what Java does to handle the visibility modifiers public, private, etc. The "private version" of a class is like the Impl version above: anyone who can see that version gets to see all the exposed "things" (fields in Java, types in OCaml), without any encapsulation. The "public version" of a class is like the sealed version above: anyone who can see that version is forced to treat the "things" (fields in Java, types in OCaml) as abstract, hence encapsulated.

Including a signature. Signatures also support includes. For example, we could write:

module type SetExtended = sig
  include Set
  val of_list : 'a list -> 'a t
end

Which would have an effect similar to writing the following:

module type SetExtended = sig
  (* BEGIN all the includes *)
  type 'a t
  val empty : 'a t
  val mem   : 'a -> 'a t -> bool
  val add   : 'a -> 'a t -> 'a t
  val elts  : 'a t -> 'a list
  (* END all the includes *)
  val of_list : 'a list -> 'a t
end

And that module type would be suitable for ListSetDupsExtended:

module ListSetDupsExtended : SetExtended = struct
  include ListSetDupsImpl
  let of_list lst = lst 
end

By sealing the module, we've again made 'a t abstract, so no one outside that module gets to know that its representation type is actually 'a list.

Include vs. open

The include and open statements are quite similar, but they have a subtly different effect on a structure. Consider this code:

module M = struct
  let x = 0
end

module N = struct
  include M
  let y = x + 1
end

module O = struct
  open M
  let y = x + 1
end

If we enter that in the toplevel, we get the following response:

module M : sig val x : int end
module N : sig val x : int val y : int end
module O : sig val y : int end

Look closely at the values contained in each structure. N has both an x and y, whereas O has only a y. The reason is that include M causes all the definitions of M to also be included in N, so the definition of x from M is present in N. But open M only made those definitions available in the scope of O; it doesn't actually make them part of the structure. So O does not contain a definition of x, even though x is in scope during the evaluation of O's definition of y.

A metaphor for understanding this difference might be: open M imports definitions from M and makes them available for local consumption, but they aren't exported to the outside world. Whereas include M imports definitions from M, makes them available for local consumption, and additionally exports them to the outside world.

An example where include doesn't suffice

Suppose we wanted to write a function that could add a bunch of elements to a set, something like:

(* [add_all l s] is the set [s] unioned with all the elements of [l] *)
let rec add_all lst set = match lst with
  | [] -> set
  | h::t -> add_all t (add h set)

(Of course, we could code that up more tersely with a fold function.)

One possibility would be to copy that code into both structures. That would compile, but it's poor software engineering. If ever an improvement needs to be made to that code (e.g., replacing it with a fold function), we have to remember to do it in two places. So let's rule that out right away as a non-solution.

So instead, after defining both set implementations above, suppose we try to enter that code into utop outside of either implementation. We'll get an error:

# let rec add_all lst set = match lst with
    | [] -> set
    | h::t -> add_all t (add h set)
Error: Unbound value add

The problem is we either need to choose ListSetDups.add or ListSetNoDups.add. If we pick the former, the code will compile, but it will be useful only with that one implementation:

# let rec add_all lst set = match lst with
    | [] -> set
    | h::t -> add_all t (ListSetNoDups.add h set)
- : 'a list -> 'a ListSetNoDups.t -> 'a ListSetNoDups.t = <fun>

We could make the code parametric with respect to the add function:

let rec add_all' add lst set = match lst with
  | [] -> set
  | h::t -> add_all' add t (add h set)

let add_all_dups lst set = add_all' ListSetDups.add lst set
let add_all_nodups lst set = add_all' ListSetNoDups.add lst set

But this is annoying in a couple ways. First, we have to remember which function name to call, whereas all the other operations that are part of those modules have the same name, regardless of which module they're in. Second, the add_all functions live outside either module, so clients who open one of the modules won't automatically get the ability to name those functions.

Let's try to use includes to solve this problem. First, we write a module that contains the parameterized implementation of add_all':

module AddAll = struct
  let rec add_all' add lst set = match lst with
    | [] -> set
    | h::t -> add_all' add t (add h set)
end

module ListSetNoDupsExtended : SetExtended = struct
  include ListSetNoDups
  include AddAll
  let add_all lst set = add_all' add lst set
end

module ListSetDupsExtended : SetExtended = struct
  include ListSetDups
  include AddAll
  let add_all lst set = add_all' add lst set
end

We've succeeded, partially, in achieving code reuse. The code that implements add_all' has been factored out into a single location and reused in the two structures. So we could now replace it with an improved (?) version using a fold function:

module AddAll = struct
  let add_all' add lst set =
    let add' s x = add x s in
    List.fold_left add' set lst
end

But we've partially failed. We still have to write an implementation of add_all in both modules, and worse yet, those implementations are identical. So there's still code duplication occurring.

Could we do better? Yes. And that leads us to functors...

Functors

The problem we were having in the previous section was that we wanted to add code to two different modules, but that code needed to be parameterized on the details of the module to which it was being added. It's that kind of parameterization that is enabled by an OCaml language feature called functors.

The name is perhaps a bit itimidating, but a functor is simply a "function" from structures to structures. The word "function" is in quotation marks in that sentence only because it's a kind of function that's not interchangeable with the rest of the functions we've already seen. OCaml is stratified: structures are distinct from values, so functions from structures to structures cannot be written or used in the same way as functions from values to values. But conceptually, functors really are just functions.

Why "functor"? In category theory, a category contains morphisms, which are a generalization of functions as we known them, and a functor is map between categories. Likewise, OCaml structures contain functions, and OCaml functors map from structures to structures. For more information about category theory, bug Prof. Tate to teach CS 6117 again.

First, let's write a simple signature; there's nothing new here:

module type X = sig
  val x : int
end

Now, using that signature, here's a tiny example of a functor:

module IncX (M: X) = struct
  let x = M.x + 1
end

The functor's name is IncX. It's a function from structures to structures. As a function, it takes an input and produces an output. Its input is named M, and the type of its input is X. Its output is the structure that appears on the right-hand side of the equals sign: struct let x = M.x + 1.

Another way to think about IncX is that it's a parameterized structure. The parameter that it takes is named M and has type X. The structure itself has a single value named x in it. The value that x has will depend on the parameter M.

Since functors are functions, we apply them. Here's an example of applying IncX:

# module A = struct let x = 0 end
# A.x
- : int = 0

# module B = IncX(A)
# B.x
- : int = 1

# module C = IncX(B)
# C.x
- : int = 2

Each time, we pass IncX a structure. When we pass it the structure bound to the name A, the input to IncX is struct let x = 0 end. IncX takes that input and produces an output struct let x = A.x + 1 end. Since A.x is 0, the result is struct let x = 1 end. So B is bound to struct let x = 1 end. Similarly, C ends up being bound to struct let x = 2 end.

Although the functor IncX returns a structure that is quite similar to its input structure, that need not be the case. In fact, a functor can return any structure it likes, perhaps something very different than its input structure:

module MakeY (M:X) = struct
  let y = 42
end

The structure returned by MakeY has a value named y but does not have any value named x. In fact, MakeY completely ignores its input structure.

Functor syntax

In the functor syntax we've been using:

module F (M : S) = struct
  ...
end

the type annotation : S and the parentheses around it, (M : S) are required. The reason why is that type inference of the signature of a functor input is not supported.

Much like functions, functors can be written anonymously. The following two syntaxes for functors are equivalent:

module F (M : S) = struct
  ...
end

module F = functor (M : S) -> struct
  ...
end

The second form uses the functor keyword to create an anonymous functor, like how the fun keyword creates an anonymous function.

And functors can be parameterized on multiple structures:

module F (M1 : S1) ... (Mn : Sn) = struct
  ...
end

Of course, that's just syntactic sugar for a higher-order functor that takes a structure as input and returns an anonymous functor:

module F = functor (M1 : S1) -> ... -> functor (Mn : Sn) -> struct
  ...
end

If you want to specify the output type of a functor, the syntax is again similar to functions:

module F (M : Si) : So = struct
  ...
end

It's also possible to write the type annotation on the structure:

module F (M : Si) = (struct
  ...
end : So)

In that case, note that the parentheses around the anonymous structure are required. It turns out that syntax parallels a similar syntax for functions that we just haven't used before:

let f x = (x+1 : int)

The syntax for writing down the type of a functor is also much like the syntax for writing down the type of a function. Here is the type of a functor that takes a structure matching signature Si as input and returns a structure matching So:

functor (M : Si) -> So

If you wanted to annotate a functor definition with a type you can combine a couple of the syntaxes we've now seen:

module F : functor (M : Si) -> So = 
  functor (M : Si) -> struct ... end

The first occurrence of functor in that code means that what follows is a functor type, and the second occurrence means that what follows is an anonymous functor value.

Using a functor to eliminate code duplication

Since functors are really just parameterized modules, we can use them to produce functions that are parameterized on any structure that matches a signature. Here's an example of doing that.

Recall our data structures for stacks:

module type StackSig = sig
  type 'a t
  val empty : 'a t
  val push  : 'a -> 'a t -> 'a t
  val peek  : 'a t -> 'a
end

module ListStack = struct
  type 'a t = 'a list
  let empty = []
  let push x s = x::s
  let peek = function [] -> failwith "empty" | x::_ -> x
end

(* called MyStack because the standard library already has a Stack *)
module MyStack = struct
  type 'a t = Empty | Entry of 'a * 'a t
  let empty = Empty
  let push x s = Entry (x, s)
  let peek = function Empty -> failwith "empty" | Entry(x,_) -> x
end

Suppose we wanted to write code that would test a ListStack:

assert (ListStack.(empty |> push 1 |> peek) = 1)

Unfortunately, to test a MyStack, we'd have to duplicate that code:

assert (MyStack.(empty |> push 1 |> peek) = 1)

And if we had other stack implementations, we'd have to duplicate the test for them, too. That's not so horrible to contemplate if it's just one test case for a couple implementations, but if it's hundreds of tests for even a couple implementations, that's just too much duplication to be good software engineering.

Functors offer a better solution. We can write a functor that is parameterized on the stack implementation, and produces the test for that implementation:

module StackTester (S:StackSig) = struct
  assert (S.(empty |> push 1 |> peek) = 1)
end

module MyStackTester = StackTester(MyStack)
module ListStackTester = StackTester(ListStack)

Now we can factor out all our tests into the functor StackTester, and when we apply that functor to a stack implementation, we get a set of tests for that implementation. Of course, this would work with OUnit as well as assertions.

Back to the example where include didn't suffice

Earlier, we tried to add a function add_all to both ListSetNoDups and ListSetDups without having any duplicated code, but we didn't totally succeed. Now let's really do it right.

The problem we had earlier was that we needed to parameterize the implementation of add_all on the add function in the set data structure. We can accomplish that parameterization with a functor.

Here is a functor that takes in a structure named S that matches the Set signature, then produces a new structure having a single function named add_all in it:

module AddAll(S:Set) = struct
  let add_all lst set =
    let add' s x = S.add x s in
    List.fold_left add' set lst
end

Notice how the functor, in its body, uses S.add. It takes the implementation of add from S and uses it to implement add_all, thus solving the exact problem we had before when we tried to use includes.

When we apply AddAll to our set implementations, we get structures containing an add_all function for each implementation:

# module AddAllListSetDups = AddAll(ListSetDups);;
module AddAllListSetDups : 
  sig
    val add_all : 'a list -> 'a ListSetDups.t -> 'a ListSetDups.t               
  end

# module AddAllListSetNoDups = AddAll(ListSetNoDups);;
module AddAllListSetNoDups : 
  sig
    val add_all : 'a list -> 'a ListSetNoDups.t -> 'a ListSetNoDups.t               
  end

So the functor has enabled the code reuse we couldn't get before: we now can implement a single add_all function and from it derive implementations for two different set structures.

But that's the only function those two structures contain. Really what we want is a full set implementation that also contains the add_all function. We can get that by combining includes with functors:

module ExtendSet(S:Set) = struct
  include S

  let add_all lst set =
    let add' s x = S.add x s in
    List.fold_left add' set lst
end

That functor takes a set structure as input, and produces a structure that contains everything from that set structure (because of the include) as well as a new function add_all that is implemented using the add function from the set.

When we apply the functor, we get a very nice set data structure as a result:

# module ListSetNoDupsExtended = ExtendSet(ListSetNoDups);;
module ListSetNoDupsExtended :
  sig 
    type 'a t = 'a ListSetNoDups.t                                      
    val empty : 'a t
    val mem : 'a -> 'a t -> bool
    val add : 'a -> 'a t -> 'a t
    val elts : 'a t -> 'a list
    val add_all : 'a list -> 'a t -> 'a t
  end

Notice how the output structure records the fact that its type t is the same type as the type t in its input structure. They share it because of the include.

Stepping back, what we just did bears more than a passing resemblance to what you're used to doing in CS 2110 with class extension in Java. We created a base module and extended its functionality with new code while preserving its old functionality. But whereas class extension necessitates that the newly extended class is a subtype of the old, and that it still has all the old functionality, OCaml functors are more fine-grained in what they can accomplish. We can choose whether they include the old functionality. And no subtyping relationships are necessarily involved. Moreover, the functor we wrote can be used to extend any set implementation with add_all, whereas class extension applies to just a single base class. There are ways of achieving something similar in Java with mixins, which weren't supported before Java 1.5.

Standard library Map

The standard library's Map module implements a dictionary data structure using balanced binary trees. You can see the implementation of that module on GitHub as well as its interface.

The Map module defines a functor Make that creates a structure implementing a map over a particular type of keys. That type is the input structure to Make. The type of that input structure is Map.OrderedType, which are types that support a compare operation:

module type OrderedType = sig
  type t
  val compare : t -> t -> int
end

The Map module needs ordering because balanced binary trees need to be able to compare keys to determine whether one is greater than another. According to the library's documentation, compare must satisfy this specification:

(* This is a two-argument function [f] such that
 * [f e1 e2] is zero if the keys [e1] and [e2] are equal,
 * [f e1 e2] is strictly negative if [e1] is smaller than [e2],
 * and [f e1 e2] is strictly positive if [e1] is greater than [e2].
 * Example: a suitable ordering function is the generic structural
 * comparison function [Pervasives.compare]. *)
val compare : t -> t -> int

Arguably this specification is a missed opportunity for good design: the library designers could instead have defined a variant:

type order = LT | EQ | GT

and required the output type of compare to be order. But historically many languages have used comparison functions with similar specifications, such as the C standard library's strcmp function.

The output of Map.Make is a structure whose type is (almost) Map.S and supports all the usual operations we would expect from a dictionary:

module type S =
  sig
    type key
    type 'a t

    val empty: 'a t
    val mem:   key -> 'a t -> bool
    val add:   key -> 'a -> 'a t -> 'a t
    val find:  key -> 'a t -> 'a
    ...
  end

There are two reasons why we say that the output is "almost" that type:

The Map module actually specifies a sharing constraint (which we covered in the previous notes): type key = Ord.t. That is, the output of Map.Make shares its key type with the type Ord.t. That enables keys to be compared with Ord.compare. The way that sharing constraint is specified is in the type of Make (which can be found in map.mli, the interface file for the map compilation unit):
```
module Make : functor (Ord : OrderedType) -> (S with type key = Ord.t)
```
The Map module actually specifies something called a variance on the representation type, writing +'a t instead of 'a t as we did above. We won't concern ourselves with what this means; it's related to subtyping and polymorphic variants.

The functor Map.Make itself (which can be found in map.ml, the implementation file for the map compilation unit) is currently defined as follows, though of course the library is free to change its internals in the future:

module Make(Ord: OrderedType) = struct
  type key = Ord.t

  type 'a t =
    | Empty
    | Node of 'a t * key * 'a * 'a t * int
      (* left subtree * key * value * right subtree * height of node *)

  let empty = Empty

  let rec mem x = function
    | Empty -> false
    | Node(l, v, _, r, _) ->
        let c = Ord.compare x v in
        c = 0 || mem x (if c < 0 then l else r)
  ...

The key type is defined to be a synonym for the type t inside Ord, so key values are comparable using Ord.compare. The mem function uses that to compare keys and decide whether to recurse on the left subtree or right subtree.

Using the Map module

A map for integer keys. To create a map, we have to pass a structure into Map.Make, and that structure has to define a type t and compare function. The simplest way to do that is to pass an anonymous structure into the functor:

# module IntMap = Map.Make(struct type t = int let compare = Pervasives.compare end);;
module IntMap : 
  sig
    type key = int                                                              
    type 'a t
    val empty : 'a t
    ...
  end

# open IntMap;;

# let m1 = add 1 "one" empty;;
val m1 : string t = <abstr>

# find 1 m1;;
- : string = "one"

# mem 42 m1;;
- : bool = false

# find 42 m1;;
Exception: Not_found.

# bindings m1;;
- : (int * string) list = [(1, "one")]

# let m2 = add 1 1. empty;;
val m2 : float t = <abstr>

# bindings m2;;
- : (int * float) list = [(1, 1.)]

Here are some things to note about the utop transcript above:

We can write a structure on one line, even though until now we've always used line breaks to keep them readable. When writing a structure on on line (which we'll only do for really short structures) it can be useful to use the double semicolon between definitions to enhance readability:
```
# module IntMap = Map.Make(struct type t = int;; let compare = Pervasives.compare end);;
```
This is an exception to the general style rule of avoiding double semicolon inside source code.

If we didn't want to pass an anonymous structure, we could instead define a module and pass it:
```
module Int = struct
  type t = int
  let compare = Pervasives.compare
end
module IntMap = Map.Make(Int)
```
The signature of the structure returned by Map.Make records the fact that keys are of type int. The type 'a t is the name of the representation type of an IntMap. The 'a type variable in it is the type of values in the map. Although in general the map could have any value type, once we add a single value to a map, that "pins down" the value type of that particular map. When we add the binding from key 1 to string "one above, notice that the map value returned is of type string t.
The bindings function of a map returns an association list of all the bindings in the map. Association lists are, of course, another data structure that implements a dictionary. But they are less efficient than the balanced binary search tree implementation used by Map.
The mem function tests whether a key is a member of a map. The find function finds the value associated with a key, and raises the Not_found exception if the key is not bound in the map. That's the same exception that List.assoc raises if a key is not bound in an association list.

A map for string keys. If a module already provides a type t that can be compared, we can immediately use that module as an argument to Map.Make. Several standard library modules are designed to be used in that way. For example, the String module defines a type t and a compare function that meet the specification of Map.OrderedType. So we can easily create maps whose key type is string:

# module StringMap = Map.Make(String);;
module StringMap :
  sig
    type key = string                                                           
    ...
  end

Now we could use the string map like we used the int map. This time, for sake of example, let's not open the StringMap module:

# let m = StringMap.(add "one" 1 empty);;
# let m' = StringMap.(add "two" 2 m);;
# StringMap.bindings m';;
- : (string * int) list = [("one", 1); ("two", 2)] 
# StringMap.bindings m;;
- : (string * int) list = [("one", 1)] 
#

Note that maps are a functional data structure: adding a mapping to m did not mutate m; rather, it produced a new map that we bound to m', and both the new map and old map remain available for use.

A map for record keys. When the type of a key becomes more complicated than a built-in primitive type, we might want to write our own custom comparison function. For example, suppose we want a map in which keys are records representing names, and in which names are sorted alphabetically by last name then by first name. In the code below, we provide a module Name that can compare records that way:

type name = {first:string; last:string}

module Name = struct
  type t = name
  let compare {first=first1;last=last1}
              {first=first2;last=last2} =
    match Pervasives.compare last1 last2 with
    | 0 -> Pervasives.compare first1 first2
    | c -> c
end

The Name module can be used as input to Map.Make because it matches the Map.OrderedType signature:

module NameMap = Map.Make(Name)

And now we could add some names to a map. Below, for sake of example, we map some names to birth years, and we use the pipeline operator to easily add multiple bindings one after another:

let k1 = {last="Kardashian"; first="Kourtney"}
let k2 = {last="Kardashian"; first="Kimberly"}
let k3 = {last="Kardashian"; first="Khloe"}
let k4 = {last="West"; first="Kanye"}

let nm = NameMap.(
  empty |> add k1 1979 |> add k2 1980 
        |> add k3 1984 |> add k4 1977)

let lst = NameMap.bindings nm

The value of lst will be

[({first = "Khloe"; last = "Kardashian"}, 1984);
 ({first = "Kimberly"; last = "Kardashian"}, 1980);
 ({first = "Kourtney"; last = "Kardashian"}, 1979);
 ({first = "Kanye"; last = "West"}, 1977)]

Note how the order of keys in that list is not the same as the order in which we added them. The list is sorted according to the Name.compare function we wrote. Several of the other functions in the Map.S signature will also process map bindings in that sorted order—for example, map, fold, and iter.

Code reuse with Map

Stepping back from the mechanics of how to use Map, let's think about how it achieves code reuse. The implementor of Map had a tricky problem to solve: balanced binary search trees require a way to compare keys, but the implementor can't know in advance all the different types of keys that a client of the data structure will want to use. And each type of key might need its own comparison function. Although the standard library's Pervasives.compare can be used to compare any two values of the same type, the result it returns isn't necessarily what a client will want. For example, it's not guaranteed to sort names in the way we wanted above.

So the implementor of Map parameterized it on a structure that bundles together the type of keys with a function that can be used to compare them. It's the client's responsibility to implement that structure. Given it, all the code in Map can be re-used by the client.

The Java Collections Framework solves a similar problem in the TreeMap class, which has a constructor that takes a Comparator. There, the client has the responsibility of implementing a class for comparisons, rather than a structure. Though the language features are different, the idea is the same.

Summary

Functors are an advanced language feature in OCaml that might seem mysterious at first. If so, keep in mind: they're really just a kind of function that takes a structure as input and returns a structure as output. The reason they don't behave quite like normal OCaml functions is that structures are not first-class values in OCaml: you can't write regular functions that take a structure as input or return a structure as output. But functors can do just that.

Functors and includes enable code reuse. The kinds of code reuse you learned to achieve in CS 2110 with object-oriented features can also be achieved with functors and include. That's not to say that functors and includes are exactly equivalent to those object-oriented features: some kinds of code reuse might be easier to achieve with one set of features than the other.

One way to think about this might be that class extension is a very limited (but very useful) combination of functors and includes: extending a class is like writing a functor that takes the base class as input, includes it, then adds new functions. But functors provide more general capability than class extension, because they can compute arbitrary functions of their input structure, rather than being limited to just certain kinds of extension.

Perhaps the most important idea to get out of studying the OCaml module system is an appreciation for the aspects of modularity that transcend any given language: namespaces, abstraction, and code reuse. Having seen those ideas in a couple very different languages, you're equipped to recognize them more clearly in the next language you learn.

Terms and concepts

code reuse
functor
include
maintainability
maps
modularity
open
parameterized structure
set representations
signatures
structures

Functors

Sets

Includes

Semantics of include

Include vs. open

An example where include doesn't suffice

Functors

Functor syntax

Using a functor to eliminate code duplication

Back to the example where include didn't suffice

Standard library Map

Using the Map module

Code reuse with Map

Summary

Terms and concepts

Further reading