Functors
Topics:
- sets represented with lists
- code reuse
- includes
- functors
- the standard library
Map
module
In the previous lecture we began studying the OCaml module system, and we saw how it supports namespaces and abstraction. Today we'll look at how it supports code reuse, so that code need not be copied or rewritten.
Sets
As a running example, let's consider a data structure for representing sets:
module type Set = sig
type 'a t
(* [empty] is the empty set *)
val empty : 'a t
(* [mem x s] holds iff [x] is an element of [s] *)
val mem : 'a -> 'a t -> bool
(* [add x s] is the set [s] unioned with the set containing exactly [x] *)
val add : 'a -> 'a t -> 'a t
(* [elts s] is a list containing the elements of [s]. No guarantee
* is made about the ordering of that list. *)
val elts : 'a t -> 'a list
end
There are many other operations a set data structure might be expected to support, but these will suffice for now.
Here's an implementation of that interface:
module ListSetNoDups : Set = struct
type 'a t = 'a list
let empty = []
let mem = List.mem
let add x s = if mem x s then s else x::s
let elts s = s
end
Note how add
ensures that the representation never contains any duplicates,
so the implementation of elts
is quite easy. Of course, that makes the
implementation of add
linear time, which is not ideal. But if we want
high-performance sets, lists are not the right representation type anyway;
there are much better data structures for sets, and you might see some
later in this course or in an algorithms course.
Here's a second implementation:
module ListSetDups : Set = struct
type 'a t = 'a list
let empty = []
let mem = List.mem
let add x s = x::s
let elts s = List.sort_uniq Pervasives.compare s
end
In that implementation, the add
operation is now constant time, and
the elts
operation is logarithmic time. For some (most?) workloads, this
is a better choice than ListSetNoDups
, especially if add
is
likely to be called more often than elts
, and if sets aren't expected
to contain many duplicates.
Includes
Suppose we wanted to add a function of_list : 'a list -> 'a t
to the
ListSetDups
module that could construct a set out of a list. If we
had access to the source code of both ListSetDups
and Set
, and if we
were permitted to modify it, this wouldn't be hard. But what if they
were third-party libraries for which we didn't have source code?
In CS 2110, you will have learned about extending classes and inheriting methods of a superclass. Those object-oriented language features provide (among many other things) the ability to reuse code. For example, a subclass includes all the methods of its superclasses, though some might by overridden.
OCaml provides a language features called includes that also enables code reuse. This feature is similar to the object-oriented example we just gave: it enables a structure to include all the values defined by another structure, or a signature to include all the names declared by another signature.
We can use includes to solve the problem of adding of_list
to ListSetDups
:
module ListSetDupsExtended = struct
include ListSetDups
let of_list lst = List.fold_right add lst empty
end
This code says that ListSetDupsExtended
is a structure containing all
the definitions of the ListSetDups
structure, as well as a definition
of of_list
. We don't have to know the source code implementing ListSetDups
to make this happen. (You might wonder we why can't simply write
let of_list lst = lst
. See the section on the semantics of includes,
below, for the answer.)
If we want to provide a new implementation of one of the included functions we could do that too:
module ListSetDupsExtended = struct
include ListSetDups
let of_list lst = List.fold_right add lst empty
let rec elts = function
| [] -> []
| h::t -> if mem h t then elts' t else h::(elts' t)
end
But that's actually a less efficient implementation of
elts
, so we probably shouldn't do that for real.
One misconception to watch out for is that the above example
does not replace the original implementation of elts
. If
any code inside ListSetDups
called that original implementation,
it still would in ListSetDupsExtended
. Why? Remember
the semantics of modules: all definitions are evaluated from
top to bottom, in order. So the new definition of elts
above
won't come into use until the very end of evaluation. This differs
from what you might expect from object-oriented languages like Java,
which use a language feature called dynamic dispatch to figure
out which implementation to invoke.
Semantics of include
Includes can be used inside of structures and inside of signatures. Of course, when we include inside a signature, we must be including another signature. And when we include inside a structure, we must be including another structure.
Including a structure.
Including a structure is like writing a local definition for each name defined
in the module. Writing include ListSetDups
as did above,
for example, has an effect similar to writing exactly the following:
module ListSetDupsExtended = struct
(* BEGIN all the includes *)
type 'a t = 'a ListSetDups.t
let empty = ListSetDups.empty
let mem = ListSetDups.mem
let add = ListSetDups.add
let elts = ListSetDups.elts
(* END all the includes *)
let of_list lst = List.fold_right add lst empty
end
But if the set of names defined inside ListSetDups
ever changed,
the include
would reflect that change, whereas the static code
we wrote above would not.
Encapsulation and includes.
We mentioned above that you might wonder why we didn't write this simpler definition
of of_list
:
let of_list lst = lst
The reason is that includes must obey encapsulation, just like the rest
of the module system. ListSetDups
was sealed with the module type
Set
, thus making 'a t
abstract. So even ListSetDupsExtended
is
forbidden from knowing that 'a t
and 'a list
are synonyms.
A standard way to solve this problem is to rewrite the definitions as folllows:
module ListSetDupsImpl = struct
type 'a t = 'a list
let empty = []
let mem = List.mem
let add x s = x::s
let elts s = List.sort_uniq Pervasives.compare s
end
module ListSetDups : Set = ListSetDupsImpl
module ListSetDupsExtended = struct
include ListSetDupsImpl
let of_list lst = lst
end
The important change is that ListSetDupsImpl
is not sealed, so its type 'a t
is not abstract. When we include it in ListSetDupsExtended
, we can therefore
exploit the fact that it's a synonym for 'a list
.
What we just did is effectively the same as what Java does to handle the
visibility modifiers public
, private
, etc. The "private version" of
a class is like the Impl
version above: anyone who can see that
version gets to see all the exposed "things" (fields in Java, types in
OCaml), without any encapsulation. The "public version" of a class is
like the sealed version above: anyone who can see that version is
forced to treat the "things" (fields in Java, types in OCaml) as abstract,
hence encapsulated.
Including a signature. Signatures also support includes. For example, we could write:
module type SetExtended = sig
include Set
val of_list : 'a list -> 'a t
end
Which would have an effect similar to writing the following:
module type SetExtended = sig
(* BEGIN all the includes *)
type 'a t
val empty : 'a t
val mem : 'a -> 'a t -> bool
val add : 'a -> 'a t -> 'a t
val elts : 'a t -> 'a list
(* END all the includes *)
val of_list : 'a list -> 'a t
end
And that module type would be suitable for ListSetDupsExtended
:
module ListSetDupsExtended : SetExtended = struct
include ListSetDupsImpl
let of_list lst = lst
end
By sealing the module, we've again made 'a t
abstract, so no one outside
that module gets to know that its representation type is actually 'a list
.
Include vs. open
The include
and open
statements are quite similar, but they have
a subtly different effect on a structure. Consider this code:
module M = struct
let x = 0
end
module N = struct
include M
let y = x + 1
end
module O = struct
open M
let y = x + 1
end
If we enter that in the toplevel, we get the following response:
module M : sig val x : int end
module N : sig val x : int val y : int end
module O : sig val y : int end
Look closely at the values contained in each structure. N
has both
an x
and y
, whereas O
has only a y
. The reason is that
include M
causes all the definitions of M
to also be included in
N
, so the definition of x
from M
is present in N
. But open M
only made those definitions available in the scope of O
; it doesn't
actually make them part of the structure. So O
does not contain
a definition of x
, even though x
is in scope during the evaluation
of O
's definition of y
.
A metaphor for understanding this difference might be: open M
imports definitions from M
and makes them available for local
consumption, but they aren't exported to the outside world.
Whereas include M
imports definitions from M
, makes them
available for local consumption, and additionally exports
them to the outside world.
An example where include doesn't suffice
Suppose we wanted to write a function that could add a bunch of elements to a set, something like:
(* [add_all l s] is the set [s] unioned with all the elements of [l] *)
let rec add_all lst set = match lst with
| [] -> set
| h::t -> add_all t (add h set)
(Of course, we could code that up more tersely with a fold function.)
One possibility would be to copy that code into both structures. That would compile, but it's poor software engineering. If ever an improvement needs to be made to that code (e.g., replacing it with a fold function), we have to remember to do it in two places. So let's rule that out right away as a non-solution.
So instead, after defining both set implementations above, suppose we try to enter that code into utop outside of either implementation. We'll get an error:
# let rec add_all lst set = match lst with
| [] -> set
| h::t -> add_all t (add h set)
Error: Unbound value add
The problem is we either need to choose ListSetDups.add
or ListSetNoDups.add
.
If we pick the former, the code will compile, but it will be useful only
with that one implementation:
# let rec add_all lst set = match lst with
| [] -> set
| h::t -> add_all t (ListSetNoDups.add h set)
- : 'a list -> 'a ListSetNoDups.t -> 'a ListSetNoDups.t = <fun>
We could make the code parametric with respect to the add
function:
let rec add_all' add lst set = match lst with
| [] -> set
| h::t -> add_all' add t (add h set)
let add_all_dups lst set = add_all' ListSetDups.add lst set
let add_all_nodups lst set = add_all' ListSetNoDups.add lst set
But this is annoying in a couple ways. First, we have to remember which
function name to call, whereas all the other operations that are part of
those modules have the same name, regardless of which module they're in.
Second, the add_all
functions live outside either module, so clients
who open one of the modules won't automatically get the ability to name
those functions.
Let's try to use includes to solve this problem. First,
we write a module that contains the parameterized implementation
of add_all'
:
module AddAll = struct
let rec add_all' add lst set = match lst with
| [] -> set
| h::t -> add_all' add t (add h set)
end
module ListSetNoDupsExtended : SetExtended = struct
include ListSetNoDups
include AddAll
let add_all lst set = add_all' add lst set
end
module ListSetDupsExtended : SetExtended = struct
include ListSetDups
include AddAll
let add_all lst set = add_all' add lst set
end
We've succeeded, partially, in achieving code reuse. The code that
implements add_all'
has been factored out into a single location and
reused in the two structures. So we could now replace it with an
improved (?) version using a fold function:
module AddAll = struct
let add_all' add lst set =
let add' s x = add x s in
List.fold_left add' set lst
end
But we've partially failed. We still have to write an implementation
of add_all
in both modules, and worse yet, those implementations
are identical. So there's still code duplication occurring.
Could we do better? Yes. And that leads us to functors...
Functors
The problem we were having in the previous section was that we wanted to add code to two different modules, but that code needed to be parameterized on the details of the module to which it was being added. It's that kind of parameterization that is enabled by an OCaml language feature called functors.
The name is perhaps a bit itimidating, but a functor is simply a "function" from structures to structures. The word "function" is in quotation marks in that sentence only because it's a kind of function that's not interchangeable with the rest of the functions we've already seen. OCaml is stratified: structures are distinct from values, so functions from structures to structures cannot be written or used in the same way as functions from values to values. But conceptually, functors really are just functions.
Why "functor"? In category theory, a category contains morphisms, which are a generalization of functions as we known them, and a functor is map between categories. Likewise, OCaml structures contain functions, and OCaml functors map from structures to structures. For more information about category theory, bug Prof. Tate to teach CS 6117 again.
First, let's write a simple signature; there's nothing new here:
module type X = sig
val x : int
end
Now, using that signature, here's a tiny example of a functor:
module IncX (M: X) = struct
let x = M.x + 1
end
The functor's name is IncX
. It's a function from structures to structures.
As a function, it takes an input and produces an output. Its input
is named M
, and the type of its input is X
. Its output
is the structure that appears on the right-hand side of the equals sign:
struct let x = M.x + 1
.
Another way to think about IncX
is that it's a parameterized structure.
The parameter that it takes is named M
and has type X
. The structure itself
has a single value named x
in it. The value that x
has will depend
on the parameter M
.
Since functors are functions, we apply them. Here's an example of applying
IncX
:
# module A = struct let x = 0 end
# A.x
- : int = 0
# module B = IncX(A)
# B.x
- : int = 1
# module C = IncX(B)
# C.x
- : int = 2
Each time, we pass IncX
a structure. When we pass it the structure bound
to the name A
, the input to IncX
is struct let x = 0 end
. IncX
takes that input and produces an output struct let x = A.x + 1 end
.
Since A.x
is 0
, the result is struct let x = 1 end
. So B
is bound to struct let x = 1 end
. Similarly, C
ends up being
bound to struct let x = 2 end
.
Although the functor IncX
returns a structure that is quite similar to
its input structure, that need not be the case. In fact, a functor can
return any structure it likes, perhaps something very different than its
input structure:
module MakeY (M:X) = struct
let y = 42
end
The structure returned by MakeY
has a value named y
but does not
have any value named x
. In fact, MakeY
completely ignores its
input structure.
Functor syntax
In the functor syntax we've been using:
module F (M : S) = struct
...
end
the type annotation : S
and the parentheses around it, (M : S)
are required.
The reason why is that type inference of the signature of a functor input
is not supported.
Much like functions, functors can be written anonymously. The following two syntaxes for functors are equivalent:
module F (M : S) = struct
...
end
module F = functor (M : S) -> struct
...
end
The second form uses the functor
keyword to create an anonymous functor,
like how the fun
keyword creates an anonymous function.
And functors can be parameterized on multiple structures:
module F (M1 : S1) ... (Mn : Sn) = struct
...
end
Of course, that's just syntactic sugar for a higher-order functor that takes a structure as input and returns an anonymous functor:
module F = functor (M1 : S1) -> ... -> functor (Mn : Sn) -> struct
...
end
If you want to specify the output type of a functor, the syntax is again similar to functions:
module F (M : Si) : So = struct
...
end
It's also possible to write the type annotation on the structure:
module F (M : Si) = (struct
...
end : So)
In that case, note that the parentheses around the anonymous structure are required. It turns out that syntax parallels a similar syntax for functions that we just haven't used before:
let f x = (x+1 : int)
The syntax for writing down the type of a functor is also much like the
syntax for writing down the type of a function. Here is the type of
a functor that takes a structure matching signature Si
as input
and returns a structure matching So
:
functor (M : Si) -> So
If you wanted to annotate a functor definition with a type you can combine a couple of the syntaxes we've now seen:
module F : functor (M : Si) -> So =
functor (M : Si) -> struct ... end
The first occurrence of functor
in that code means that what follows
is a functor type, and the second occurrence means that what follows
is an anonymous functor value.
Using a functor to eliminate code duplication
Since functors are really just parameterized modules, we can use them to produce functions that are parameterized on any structure that matches a signature. Here's an example of doing that.
Recall our data structures for stacks:
module type StackSig = sig
type 'a t
val empty : 'a t
val push : 'a -> 'a t -> 'a t
val peek : 'a t -> 'a
end
module ListStack = struct
type 'a t = 'a list
let empty = []
let push x s = x::s
let peek = function [] -> failwith "empty" | x::_ -> x
end
(* called MyStack because the standard library already has a Stack *)
module MyStack = struct
type 'a t = Empty | Entry of 'a * 'a t
let empty = Empty
let push x s = Entry (x, s)
let peek = function Empty -> failwith "empty" | Entry(x,_) -> x
end
Suppose we wanted to write code that would test a ListStack
:
assert (ListStack.(empty |> push 1 |> peek) = 1)
Unfortunately, to test a MyStack
, we'd have to duplicate that code:
assert (MyStack.(empty |> push 1 |> peek) = 1)
And if we had other stack implementations, we'd have to duplicate the test for them, too. That's not so horrible to contemplate if it's just one test case for a couple implementations, but if it's hundreds of tests for even a couple implementations, that's just too much duplication to be good software engineering.
Functors offer a better solution. We can write a functor that is parameterized on the stack implementation, and produces the test for that implementation:
module StackTester (S:StackSig) = struct
assert (S.(empty |> push 1 |> peek) = 1)
end
module MyStackTester = StackTester(MyStack)
module ListStackTester = StackTester(ListStack)
Now we can factor out all our tests into the functor StackTester
, and
when we apply that functor to a stack implementation, we get a set of
tests for that implementation. Of course, this would work with OUnit
as well as assertions.
Back to the example where include didn't suffice
Earlier, we tried to add a function add_all
to both ListSetNoDups
and ListSetDups
without having any duplicated code, but we didn't
totally succeed. Now let's really do it right.
The problem we had earlier was that we needed to parameterize the
implementation of add_all
on the add
function in the set
data structure. We can accomplish that parameterization with
a functor.
Here is a functor that takes in a structure named S
that matches the Set
signature, then produces a new structure having a single function named
add_all
in it:
module AddAll(S:Set) = struct
let add_all lst set =
let add' s x = S.add x s in
List.fold_left add' set lst
end
Notice how the functor, in its body, uses S.add
. It takes the implementation
of add
from S
and uses it to implement add_all
, thus solving the
exact problem we had before when we tried to use includes.
When we apply AddAll
to our set implementations, we get structures
containing an add_all
function for each implementation:
# module AddAllListSetDups = AddAll(ListSetDups);;
module AddAllListSetDups :
sig
val add_all : 'a list -> 'a ListSetDups.t -> 'a ListSetDups.t
end
# module AddAllListSetNoDups = AddAll(ListSetNoDups);;
module AddAllListSetNoDups :
sig
val add_all : 'a list -> 'a ListSetNoDups.t -> 'a ListSetNoDups.t
end
So the functor has enabled the code reuse we couldn't get before:
we now can implement a single add_all
function and from it derive
implementations for two different set structures.
But that's the only function those two structures contain. Really
what we want is a full set implementation that also contains the
add_all
function. We can get that by combining includes with functors:
module ExtendSet(S:Set) = struct
include S
let add_all lst set =
let add' s x = S.add x s in
List.fold_left add' set lst
end
That functor takes a set structure as input, and produces a structure that contains
everything from that set structure (because of the include
) as well as
a new function add_all
that is implemented using the add
function from the
set.
When we apply the functor, we get a very nice set data structure as a result:
# module ListSetNoDupsExtended = ExtendSet(ListSetNoDups);;
module ListSetNoDupsExtended :
sig
type 'a t = 'a ListSetNoDups.t
val empty : 'a t
val mem : 'a -> 'a t -> bool
val add : 'a -> 'a t -> 'a t
val elts : 'a t -> 'a list
val add_all : 'a list -> 'a t -> 'a t
end
Notice how the output structure records the fact that its type t
is the
same type as the type t
in its input structure. They share it because
of the include
.
Stepping back, what we just did bears more than a passing resemblance
to what you're used to doing in CS 2110 with class extension in Java. We
created a base module and extended its functionality with new code
while preserving its old functionality. But whereas class extension
necessitates that the newly extended class is a subtype of the old,
and that it still has all the old functionality, OCaml functors
are more fine-grained in what they can accomplish. We can choose
whether they include the old functionality. And no subtyping relationships
are necessarily involved. Moreover, the functor we wrote can be used
to extend any set implementation with add_all
, whereas class
extension applies to just a single base class. There are ways
of achieving something similar in Java with mixins, which weren't
supported before Java 1.5.
Standard library Map
The standard library's Map module implements a dictionary data structure using balanced binary trees. You can see the implementation of that module on GitHub as well as its interface.
The Map module defines a functor Make
that creates a structure implementing
a map over a particular type of keys. That type is the input structure
to Make
. The type of that input structure is Map.OrderedType
, which are types that
support a compare
operation:
module type OrderedType = sig
type t
val compare : t -> t -> int
end
The Map module needs ordering because balanced binary trees need to be able
to compare keys to determine whether one is greater than another.
According to the library's documentation, compare
must satisfy this
specification:
(* This is a two-argument function [f] such that
* [f e1 e2] is zero if the keys [e1] and [e2] are equal,
* [f e1 e2] is strictly negative if [e1] is smaller than [e2],
* and [f e1 e2] is strictly positive if [e1] is greater than [e2].
* Example: a suitable ordering function is the generic structural
* comparison function [Pervasives.compare]. *)
val compare : t -> t -> int
Arguably this specification is a missed opportunity for good design: the library designers could instead have defined a variant:
type order = LT | EQ | GT
and required the output type of compare
to be order
. But historically many
languages have used comparison functions with similar specifications, such
as the C standard library's strcmp
function.
The output of Map.Make
is a structure whose type is (almost) Map.S
and supports
all the usual operations we would expect from a dictionary:
module type S =
sig
type key
type 'a t
val empty: 'a t
val mem: key -> 'a t -> bool
val add: key -> 'a -> 'a t -> 'a t
val find: key -> 'a t -> 'a
...
end
There are two reasons why we say that the output is "almost" that type:
The Map module actually specifies a sharing constraint (which we covered in the previous notes):
type key = Ord.t
. That is, the output ofMap.Make
shares itskey
type with the typeOrd.t
. That enables keys to be compared withOrd.compare
. The way that sharing constraint is specified is in the type ofMake
(which can be found inmap.mli
, the interface file for the map compilation unit):module Make : functor (Ord : OrderedType) -> (S with type key = Ord.t)
The Map module actually specifies something called a variance on the representation type, writing
+'a t
instead of'a t
as we did above. We won't concern ourselves with what this means; it's related to subtyping and polymorphic variants.
The functor Map.Make
itself (which can be found in map.ml
, the implementation
file for the map compilation unit) is currently defined as follows, though of course
the library is free to change its internals in the future:
module Make(Ord: OrderedType) = struct
type key = Ord.t
type 'a t =
| Empty
| Node of 'a t * key * 'a * 'a t * int
(* left subtree * key * value * right subtree * height of node *)
let empty = Empty
let rec mem x = function
| Empty -> false
| Node(l, v, _, r, _) ->
let c = Ord.compare x v in
c = 0 || mem x (if c < 0 then l else r)
...
The key
type is defined to be a synonym for the type t
inside Ord
, so
key
values are comparable using Ord.compare
. The mem
function uses
that to compare keys and decide whether to recurse on the left subtree or right
subtree.
Using the Map module
A map for integer keys.
To create a map, we have to pass a structure into Map.Make
, and that structure
has to define a type t
and compare
function. The simplest way to do
that is to pass an anonymous structure into the functor:
# module IntMap = Map.Make(struct type t = int let compare = Pervasives.compare end);;
module IntMap :
sig
type key = int
type 'a t
val empty : 'a t
...
end
# open IntMap;;
# let m1 = add 1 "one" empty;;
val m1 : string t = <abstr>
# find 1 m1;;
- : string = "one"
# mem 42 m1;;
- : bool = false
# find 42 m1;;
Exception: Not_found.
# bindings m1;;
- : (int * string) list = [(1, "one")]
# let m2 = add 1 1. empty;;
val m2 : float t = <abstr>
# bindings m2;;
- : (int * float) list = [(1, 1.)]
Here are some things to note about the utop transcript above:
We can write a structure on one line, even though until now we've always used line breaks to keep them readable. When writing a structure on on line (which we'll only do for really short structures) it can be useful to use the double semicolon between definitions to enhance readability:
# module IntMap = Map.Make(struct type t = int;; let compare = Pervasives.compare end);;
This is an exception to the general style rule of avoiding double semicolon inside source code.
If we didn't want to pass an anonymous structure, we could instead define a module and pass it:
module Int = struct type t = int let compare = Pervasives.compare end module IntMap = Map.Make(Int)
The signature of the structure returned by
Map.Make
records the fact that keys are of typeint
. The type'a t
is the name of the representation type of anIntMap
. The'a
type variable in it is the type of values in the map. Although in general the map could have any value type, once we add a single value to a map, that "pins down" the value type of that particular map. When we add the binding from key1
to string"one
above, notice that the map value returned is of typestring t
.The
bindings
function of a map returns an association list of all the bindings in the map. Association lists are, of course, another data structure that implements a dictionary. But they are less efficient than the balanced binary search tree implementation used byMap
.The
mem
function tests whether a key is a member of a map. Thefind
function finds the value associated with a key, and raises theNot_found
exception if the key is not bound in the map. That's the same exception thatList.assoc
raises if a key is not bound in an association list.
A map for string keys.
If a module already provides a type t
that can be compared, we can
immediately use that module as an argument to Map.Make
. Several
standard library modules are designed to be used in that way.
For example, the String
module defines a type t
and a compare
function
that meet the specification of Map.OrderedType
. So we can easily
create maps whose key type is string
:
# module StringMap = Map.Make(String);;
module StringMap :
sig
type key = string
...
end
Now we could use the string map like we used the int map. This time, for sake
of example, let's not open the StringMap
module:
# let m = StringMap.(add "one" 1 empty);;
# let m' = StringMap.(add "two" 2 m);;
# StringMap.bindings m';;
- : (string * int) list = [("one", 1); ("two", 2)]
# StringMap.bindings m;;
- : (string * int) list = [("one", 1)]
#
Note that maps are a functional data structure: adding a mapping to m
did not mutate m
; rather, it produced a new map that we bound
to m'
, and both the new map and old map remain available for use.
A map for record keys.
When the type of a key becomes more complicated than a built-in primitive
type, we might want to write our own custom comparison function. For
example, suppose we want a map in which keys are records representing names,
and in which names are sorted alphabetically by last name then by first name.
In the code below, we provide a module Name
that can compare records that way:
type name = {first:string; last:string}
module Name = struct
type t = name
let compare {first=first1;last=last1}
{first=first2;last=last2} =
match Pervasives.compare last1 last2 with
| 0 -> Pervasives.compare first1 first2
| c -> c
end
The Name
module can be used as input to Map.Make
because it matches
the Map.OrderedType
signature:
module NameMap = Map.Make(Name)
And now we could add some names to a map. Below, for sake of example, we map some names to birth years, and we use the pipeline operator to easily add multiple bindings one after another:
let k1 = {last="Kardashian"; first="Kourtney"}
let k2 = {last="Kardashian"; first="Kimberly"}
let k3 = {last="Kardashian"; first="Khloe"}
let k4 = {last="West"; first="Kanye"}
let nm = NameMap.(
empty |> add k1 1979 |> add k2 1980
|> add k3 1984 |> add k4 1977)
let lst = NameMap.bindings nm
The value of lst
will be
[({first = "Khloe"; last = "Kardashian"}, 1984);
({first = "Kimberly"; last = "Kardashian"}, 1980);
({first = "Kourtney"; last = "Kardashian"}, 1979);
({first = "Kanye"; last = "West"}, 1977)]
Note how the order of keys in that list is not the same as the order in which
we added them. The list is sorted according to the Name.compare
function we wrote.
Several of the other functions in the Map.S
signature will also process map bindings
in that sorted order—for example, map
, fold
, and iter
.
Code reuse with Map
Stepping back from the mechanics of how to use Map
, let's think about
how it achieves code reuse. The implementor of Map
had a tricky problem
to solve: balanced binary search trees require a way to compare keys, but
the implementor can't know in advance all the different types of keys
that a client of the data structure will want to use. And each type of
key might need its own comparison function. Although the standard library's
Pervasives.compare
can be used to compare any two values of the same
type, the result it returns isn't necessarily what a client will want. For
example, it's not guaranteed to sort names in the way we wanted above.
So the implementor of Map
parameterized it on a structure that bundles
together the type of keys with a function that can be used to compare them.
It's the client's responsibility to implement that structure. Given it,
all the code in Map
can be re-used by the client.
The Java Collections Framework solves a similar problem in the TreeMap class, which has a constructor that takes a Comparator. There, the client has the responsibility of implementing a class for comparisons, rather than a structure. Though the language features are different, the idea is the same.
Summary
Functors are an advanced language feature in OCaml that might seem mysterious at first. If so, keep in mind: they're really just a kind of function that takes a structure as input and returns a structure as output. The reason they don't behave quite like normal OCaml functions is that structures are not first-class values in OCaml: you can't write regular functions that take a structure as input or return a structure as output. But functors can do just that.
Functors and includes enable code reuse. The kinds of code reuse you learned to achieve in CS 2110 with object-oriented features can also be achieved with functors and include. That's not to say that functors and includes are exactly equivalent to those object-oriented features: some kinds of code reuse might be easier to achieve with one set of features than the other.
One way to think about this might be that class extension is a very limited (but very useful) combination of functors and includes: extending a class is like writing a functor that takes the base class as input, includes it, then adds new functions. But functors provide more general capability than class extension, because they can compute arbitrary functions of their input structure, rather than being limited to just certain kinds of extension.
Perhaps the most important idea to get out of studying the OCaml module system is an appreciation for the aspects of modularity that transcend any given language: namespaces, abstraction, and code reuse. Having seen those ideas in a couple very different languages, you're equipped to recognize them more clearly in the next language you learn.
Terms and concepts
- code reuse
- functor
- include
- maintainability
- maps
- modularity
- open
- parameterized structure
- set representations
- signatures
- structures
Further reading
- Introduction to Objective Caml, chapter 13
- Real World OCaml, chapter 9