(* THERE'S NO IMPLEMENTATION OF THIS *) module type COLLECTION = sig type 'a collection (* These functions are nearly always present*) val empty: 'a collection (* what parameters might this have? *) val map: ('a -> 'b) -> 'a collection -> 'b collection val app: ('a -> unit) -> 'a collection -> unit (* If the collection is ordered, there may be foldl and foldr *) val fold: ('a -> 'b -> 'a) -> 'a -> 'b collection -> 'a (* These functions are sometimes present *) val size: 'a collection -> int val exists: ('a -> bool) -> 'a collection -> bool val find: ('a -> bool) -> 'a collection -> 'a option val filter: ('a -> bool) -> 'a collection -> 'a collection end
When writing your own collection types, you ought to provide functions like this for the user. When looking to use a collection, you ought to try to use these functions before writing your own equivalent code.
Sets and maps are two common abstract data types; let's look at signatures for each of these, and how the two types relate.
module type SET = sig type 'a set val add: ('a set * 'a) -> 'a set val isMemberOf: ('a set * 'a) -> bool end
A set might also include other operations, such as
union, intersection, difference, isSubset, singleton
and the collection operations above, but all can be derived from
add
and isMemberOf
(and empty
).
module type MAP = sig type ('key, 'value) map val insert: (('key, 'value) map * 'key * 'value) -> ('key, 'value) map val find: (('key, 'value) map * 'key) -> 'value option end
A map might also have an operation to remove a binding. Maps
will probably also include general collection functions - map
,
fold
, etc, but one wrinkle with these is that when iterating
over the map, one might want to look at all keys, all values, or all
key, value pairs. Different map interfaces will provide one or more of
these options.
It is actually possible to implement sets in terms of maps, and maps in terms of sets. Consider the following:
(* sets in terms of maps *) let add (set, item) = insert (map, item, 1) (* or true or item or whatever *) let isMemberOf (set, item) = match find (map, item) with None -> false | Some _ -> true (* maps in terms of sets - very mathematical *) let insert (map, key, value) = add set (key,value) let find (map, key) = member set (key, _)
Note that to make the latter implementation work, we must implement the set so that when it compares entries, it only compares the key part of each key, value pair.
Any implementation of a set or map depends on a way of comparing elements (or in the case of a map, comparing keys). What sort of comparison is required depends on the implementation. Let's consider implementations of a map. For an association list (see Recitation 8), only a notion of equality is needed. For a hashtable, we need both a notion of equality as well as a hash function from our key type to integers. Finally, for a tree-based implementation, we need a comparison function returning a total ordering.
Programming in a purely functional or applicative manner requires the use of immutable data structures. The signatures we have provided so far assume this paradigm. It is also possible to have mutable collections, such as hashtables. In that case, a signature will look more like this:
module MUTABLE_MAP = sig type ('key, 'value) map val insert: (('key, 'value) map * 'key * 'value) -> unit val find: (('key, 'value) map * 'key) -> 'value option end
That is, the insert function, rather than returning a new, immutable map with the new key, instead modifies the existing map. It is often possible to gain an efficiency improvement by using mutable structures - hash tables offer O(1) access and update, assuming the hash function is good. On the other hand, mutable structures are more complex to reason about, both theoretically and practically.
The preceding discussion has all been abstract, but now we will talk about the actual set and map structures available in OCaml. Most of the functionality are available through the use of functors. A functor is a module that takes another module as an argument and produces a module as output. A functor is to a module as a function is to a value. For now, all you need to know is the syntax for this.
module SetOfInts = Set.Make (struct type t = int let compare = compare end)
What's going on here is that we are defining an (anonymous) structure
containing one type (t) and one value (compare), and passing that
structure as an argument to the Make
functor, which is itself contained in the Set
module. The return value
is a module having the signature Set.S
, which we
give the name SetOfInts. The argument of
Set.Make
must conform to the
signature Set.OrderedType
.
The most useful OCaml functors returning sets and maps are Map.Make, Set.Make, and Hashtbl.Make. These offer an efficient functional implementation based on balanced binary trees; we will discuss trees like those they use later in the course.
Docmentation for most of these functors (and all the other standard library modules) is available at http://caml.inria.fr/pub/docs/manual-ocaml/manual034.html. You can easily look up the actual code and interfaces for these modules in your OCaml distribution. The files should be in the stdlib subdirectory of your OCaml distribution, wherever you put the OCaml installation.
One thing to note about these structures is that unlike the examples given so far, they are *not* polymorphic in the type of their key (or item, for sets). Rather, the functor's argument specifies the type. A functorized analogue of the collection signature we began with might look like this:
module type Key = sig type key val equals: key * key -> bool end module type COLLECTION = sig type key type collection val empty: collection val map: (key -> key) -> collection -> collection val app: (key -> unit) -> collection -> unit (* If the collection is ordered, there may be foldl and foldr. * Key would need to have an operation "compare: key*key->order" in that * case. *) val fold: ((key * 'b) -> 'b) -> 'b -> collection -> 'b (* Operations like these are often useful. *) val size: collection -> int val exists: (key -> bool) -> collection -> bool val find: (key -> bool) -> collection -> key option val filter: (key -> bool) -> collection -> collection end
module COLLECTION_FUNCTOR (K : Key) : COLLECTION = struct type key = K.key ... end
This has the advantage that collections don't have to keep track of their own key operations. Also, if you have two collections of the same type, you know that they have the same underlying key comparison function, which helps avoid mistakes. For map abstractions, it is also possible to have a map that is polymorphic with respect to the value type but for which the key type is passed as a functor parameter. Unfortunately, OCaml makes you decide which style you want. But you're still better off with any of these styles than in many languages.
OCaml also contains hash tables, implemented by the Hashtbl module.
OCaml provides both a generic version of
hash tables and a functorial version. The generic version
uses the "magical" built-in Hashtbl.hash
function, which may or
may not be suitable for the data you are using. The functorial version
requires you to provide your own equal
and hash
functions, which allows you to specificy a hash function that you know will
do well for your data.
Note that all the OCaml hash table structures are imperative. Updates overwrite the old data structure.
(* Create an empty hash table with initial size 101 * The initial size should be around the expected number * of elements you will add for best performance, but it * will be automatically increased if it is too low *) let h = Hashtbl.create 101;; Hashtbl.add h "John Doe" "104 East Street";; Hashtbl.add h "Jane Smith" "578 Parker Road";; Hashtbl.find "John Doe";; - : string = "104 East Street" Hashtbl.find "Mary Sue";; Exception: Not_found.
(* Pass in a module for the type we're hashing * In this case, it's integers * And our hash function is just the integer itself *) module H = Hashtbl.Make(struct type t = int let equal n1 n2 = (n1=n2) let hash n = n end) let h = H.create(101);; H.add h 42 "Answer to Life, the Universe, and Everything";; H.add h 1729 "Hardy-Ramanujan number";; H.find h 42;; - : string = "Answer to Life, the Universe, and Everything" H.find h 54;; Exception: Not_found.