Modules and Printing

We'll start by learning about how printing works in OCaml. Then we'll use the OCaml module system to implement some queues and dictionaries. Finally, we'll see how compilation of modules interacts with utop and learn some new utop directives related to modules.

Printing

The Pervasives module has built-in printing functions for several of the built-in primitive types: print_char, print_int, print_string, and print_float. There's also a print_endline function, which is like print_string but also outputs a newline.

Exercise: hello world [✭]

Type the following in utop:

# print_endline "Hello world!";;
# print_string "Hello world!";;

Notice the difference in output from each.

□

Let's look at the type of those functions:

# print_endline;;
- : string -> unit = <fun>

# print_string;;
- : string -> unit = <fun>

They both take a string as input and return a value of type unit, which we haven't seen before. There is only one value of this type, which is written () and is also pronounced "unit". So unit is like bool, except there is one fewer value of type unit than there is of bool. Unit is therefore used when you need to take an argument or return a value, but there's no interesting value to pass or return. Unit is often used when you're writing or using code that has side effects. Printing is an example of a side effect: it changes the world and can't be undone.

If you want to print one thing after another, you could sequence some print functions using nested let expressions:

let () = print_endline "THIS" in
let () = print_endline "IS" in
print_endline "3110"

The () in let () = is a pattern, the unit pattern, which matches only the unit value. The entire expression evaluates to whatever print_endline 3110 returns, which is the unit value (), hence the entire expression has type unit. But the evaluation has the side effect of printing THIS IS 3110 over multiple lines of output.

The boilerplate of all the let () = ... in above is annoying to have to write! We don't really care about matching the unit value; after all, there's only one thing it could be. So there's a special syntax that can be used to chain together multiple functions who return unit. The expression e1; e2 first evaluates e1, which should evaluate to (), then discards that value, and evaluates e2. So we could rewrite the above code as:

print_endline "THIS";
print_endline "IS";
print_endline "3110"

And that is far more idiomatic code.

If e1 does not have type unit, then e1;e2 will give a warning, because you are discarding useful values. If that is truly your intent, you can call the built-in function ignore : 'a -> unit to convert any value to ():

# 3; 5;;
Warning 10: this expression should have type unit.                                                                                                                      
- : int = 5

# ignore 3; 5;;
- : int = 5

Exercise: print int list rec [✭✭]

Write a function print_int_list : int list -> unit that prints its input list, one number per line. For example, print_int_list [1;2;3] should result in this output:

1
2
3

Here is some code to get you started:

let rec print_int_list = function 
| [] -> () 
| h::t -> (* fill in here *); 
          print_int_list t

□

Exercise: print int list iter [✭✭]

Write a function print_int_list' : int list -> unit whose specification is the same as print_int_list. Do not use the keyword rec in your solution, but instead to use the List module function List.iter. Here is some code to get you started:

let print_int_list' lst = 
  List.iter (fun x -> (* fill in here *)) lst

□

Queues

Create a file named myqueue.ml. In it, put the following code:

module ListQueue = struct
  type 'a queue = 'a list

  let empty = []

  let is_empty q = (q = [])

  let enqueue x q = q @ [x] 

  let peek = function
    | []   -> failwith "Empty"
    | x::_ -> x

  let dequeue = function
    | []   -> failwith "Empty"
    | _::q -> q
end

Launch utop and #use "myqueue.ml". You will then be able to create and use ListQueue, for example:

# let q = ListQueue.enqueue 1 (ListQueue.enqueue 2 ListQueue.empty);;
# ListQueue.peek q;;  (* returns 2, because 2 was enqueued first *)

Exercise: big list queue [✭✭]

Use the following code to create ListQueue's of exponentially increasing length: 10, 100, 1000, etc. How big of a queue can you create before there is a noticeable delay? How big until there's a delay of at least 10 seconds? (Note: you can abort utop computations with Ctrl-C.)

(* Creates a ListQueue filled with [n] elements. *)
let fill_listqueue n =
  let rec loop n q =
    if n=0 then q
    else loop (n-1) (ListQueue.enqueue n q) in
  loop n ListQueue.empty

□

Add the following code to myqueue.ml:

module TwoListQueue = struct
  type 'a queue = {front:'a list; back:'a list}

  let empty = {front=[]; back=[]}

  let is_empty = function
    | {front=[]; back=[]} -> true
    | _ -> false

  let norm = function
    | {front=[]; back} -> {front=List.rev back; back=[]}
    | q -> q

  let enqueue x q = norm {q with back=x::q.back} 

  let peek = function 
    | {front=[]; _} -> None
    | {front=x::_; _} -> Some x

  let dequeue = function
    | {front=[]; _} -> None
    | {front=_::xs; back} -> Some (norm {front=xs; back})
end

Exercise: big two-list queue [✭✭]

Use the following function to again create queues of exponentially increasing length:

let fill_twolistqueue n =
  let rec loop n q =
    if n=0 then q
    else loop (n-1) (TwoListQueue.enqueue n q) in
  loop n TwoListQueue.empty

Now how big of a queue can you create before there's a delay of at least 10 seconds?

□

If we compare ListQueue and TwoListQueue, it's hopefully clear that ListQueue is a simple and correct implementation of a queue data structure. It's probably far less clear that TwoListQueue is a correct implementation! For more details about these two implementations, look at the code associated with this lecture and lab. One of the additional exercises at the end of this lab asks you to explore the efficiencies of these two implementations a bit further.

Dictionaries

A dictionary maps keys to values. This data structure typically supports at least a lookup operation that allows you to find the value with a corresponding key, an insert operation that lets you create a new dictionary with an extra key included. And there needs to be a way of creating an empty dictionary.

We might represent this in a Dictionary module type as follows:

module type Dictionary = sig
  type ('k, 'v) t

  (* The empty dictionary *)
  val empty  : ('k, 'v) t

  (* [insert k v d] produces a new dictionary [d'] with the same mappings 
   * as [d] and also a mapping from [k] to [v], even if [k] was already 
   * mapped in [d]. *)
  val insert : 'k -> 'v -> ('k,'v) t -> ('k,'v) t

  (* [lookup k d] returns the value associated with [k] in [d].  
   * raises:  [Not_found] if [k] is not mapped to any value in [d]. *)
  val lookup  : 'k -> ('k,'v) t -> 'v
end

Note how the type Dictionary.t is parameterized on two types, 'k and 'v, which are written in parentheses and separated by commas. Although ('k,'v) might look like a pair of values, it is not: it is a syntax for writing multiple type variables.

We have seen already in this class that an association list can be used as a simple implementation of a dictionary. For example, here is an association list that maps some well-known names to an approximation of their numeric value:

[("pi", 3.14); ("e", 2.718); ("phi", 1.618)]

Let's try implementing the Dictionary module type with a module called AssocListDict.

Exercise: AssocListDict [✭✭]

Put the code for Dictionary above into a file named dict.ml. Put this code after it:

module AssocListDict = struct
  type ('k, 'v) t = ('k * 'v) list

  let empty = failwith "Unimplemented"

  let insert k v d = failwith "Unimplemented"

  let lookup k d = failwith "Unimplemented"
end

Finish the implementation of each of those three definitions. For lookup, use the library function List.assoc.

□

Exercise: AssocListDict encapsulated [✭✭]

Launch utop and type:

# #use "dict.ml";;
# open AssocListDict;;
# let d = insert 1 "one" empty;;

The response you get should be:

val d : (int * string) list = [(1, "one")]

Now close utop. Change the first line of your implementation of AssocListDict in dict.ml to the following:

module AssocListDict : Dictionary = struct

Restart utop and repeat the three phrases above (use,open,let). What is the response you get now? Explain in your own words why the response changed. Your answer should incorporate "abstract type" and "encapsulation".

□

Modules and utop

There are several pragmatics with modules and the toplevel that you need to understand to use them together effectively. The following sequence of exercises, which should be done in order, is designed to help you navigate some of the common pitfalls and confusions that come up.

Compiling an OCaml file produces a module having the same name as the file (but with the first letter capitalized). These compiled modules can be loaded into the toplevel using #load.

Exercise: load [✭]

Create a file called mods.ml. Inside that file, put the following code:

let b = "bigred"
let inc x = x+1
module M = struct
  let y = 42
end

At the command line, type ocamlbuild mods.byte to compile it. Type ls _build to see the files that ocamlbuild produced. One of them is mods.cmo: this is a compiled module object file, aka bytecode.

Now run utop, and give it this directive (recall that the # character is required in front of a directive, it is not part of the prompt):

# #directory "_build";;

That tells utop to add the _build directory to the path in which it looks for compiled (and source) files.

Now give utop this directive:

# #load "mods.cmo";;

There is now a module named Mods available to be used. Try these expressions in utop:

# Mods.b;;
# Mods.M.y;;

Now try these:

# inc;;  (* will fail *)
# open Mods;;
# inc;;

Finish by exiting utop.

□

If you are doing a lot of testing of a particular module, it can be annoying to have to type those directives (#directory and #load) every time you start utop. The solution is to create a file in the working directory and call that file .ocamlinit. Note that the . at the front of that filename is required and makes it a hidden file that won't appear in directory listings unless explicitly requested (e.g., with ls -a). Everything in .ocamlinit will be processed by utop when it loads.

Exercise: ocamlinit [✭]

Using your text editor, create a file named .ocamlinit in the same directory as mods.ml. In that file, put the following:

#directory "_build";;
#load "mods.cmo";;
open Mods

Now restart utop. All the names defined in Mods will already be in scope. For example, these will both succeed:

# inc;;
# M.y;;

□

If you are building code that depends on third-party libraries, you can load those libraries with another directive:

Exercise: require [✭]

Add the following lines to the end of mods.ml:

open OUnit2
let test = "testb" >:: (fun _ -> assert_equal "bigred" b)

Try to recompile the module with ocamlbuild mods.byte. That will fail, because you need to tell the build system to include the third-party library OUnit. So try to recompile with ocamlbuild -pkg oUnit mods.byte. That will succeed.

Now try to restart utop. If you look closely, there will be an error message:

File ".ocamlinit", line 1:
Error: Reference to undefined global `OUnit2'

The problem is that the OUnit library hasn't been loaded into utop yet. Type the following directive:

#require "oUnit";;

Now you can successfully load your own module without getting an error.

#load "mods.cmo";;

Quit utop. Add the #require directive to .ocamlinit anywhere before the #load directive. Restart utop. Verify that inc is in scope.

□

When compiling a file, the build system automatically figures out which other files it depends on, and recompiles those as necessary. The toplevel, however, is not as sophisticated: you have to make sure to load all the dependencies of a file, as we'll see in the next exercise.

Exercise: loads [✭]

Create a file mods2.ml. In it, put this code:

open Mods
let x = inc 0

Run ocamlbuild -pkg oUnit mods2.byte. Notice that you don't have to name mods.byte, even though mods2.ml depends on the module Mod. The build system is smart that way.

Go back to .ocamlinit and change it be to just the following:

#directory "_build";;
#require "oUnit";;

Restart utop. Try this directive:

# #load "mods2.cmo";;
Error: Reference to undefined global `Mods'

The toplevel did not automatically load the modules that Mods2 depends upon. You can do that manually:

# #load "mods.cmo";;
# #load "mods2.cmo";;

Or you can tell the toplevel to load Mods2 and recursively to load everything depends on:

# #load_rec "mods2.cmo";;

□

There is a big difference between #load-ing a compiled module file and #use-ing an uncompiled source file. The former loads bytecode and makes it available for use. For example, loading mods.cmo caused the Mod module to be available, and we could access its members with expressions like Mod.b. The latter (#use) is textual inclusion: it's like typing the contents of the file directly into the toplevel. So using mods.ml does not cause a Mod module to be available, and the definitions in the file can be accessed directly, e.g., b. Let's try that out...

Exercise: load vs. use [✭]

Start a new toplevel, first making sure your .ocamlinit does not contain any #load directives. We already know from the previous exercise that we could load first mods.cmo then mods2.cmo. Let's try using the source code:

# #use "mods.ml" (* will succeed *)

# b;;
val b : string = "bigred"

# #use "mods2.ml" (* will fail *)
Error: Reference to undefined global `Mods'

The problem is that #use-ing mods.ml did not introduce a module into scope. Rather it directly put the definitions found in that source file into scope.

□

So when you're using the toplevel to experiment with your code, it's often better to work with #load, because this accurately reflects how your modules interact with each other and with the outside world, rather than #use them.

Additional exercises

Exercise: queue efficiency [✭✭✭]

Compare the implementations of enqueue in ListQueue vs. TwoListQueue. Explain in your own words why the efficiency of ListQueue.enqueue is linear time in the length of the queue. Hint: consider the @ operator. Then explain why adding $n$ elements to the queue takes time that is quadratic in $n$ .

Now consider TwoListQueue.enqueue. Suppose that the queue is in a state where it has never had any elements dequeued. Explain in your own words why TwoListQueue.enqueue is constant time. Then explain why adding $n$ elements to the queue takes time that is linear in $n$ .

(Note: the enqueue and dequeue operations for TwoListQueue remain constant time even after interleaving them, but showing why that is so require the study of amortized analysis, which we will not cover here.)

□

Exercise: binary search tree dictionary [✭✭✭]

Write a module BstDict that implements the Dictionary module type using the tree type and lookup and insert functions that you defined in the lab on variants.

□

Exercise: fraction [✭✭✭✭]

Write a module that implements the Fraction module type below:

module type Fraction = sig
  (* A fraction is a rational number p/q, where q != 0.*)
  type t

  (* [make n d] is n/d. Requires d != 0. *)
  val make : int -> int -> t

  val numerator   : t -> int
  val denominator : t -> int
  val toString    : t -> string
  val toReal      : t -> float

  val add : t -> t -> t
  val mul : t -> t -> t
end

Your module should ensure these invariants for every value v of type t that it returns from make, add, and mul:

v is in reduced form
the denominator of v is positive

For the first invariant, you might find this implementation of Euclid's algorithm to be helpful:

(* [gcd x y] is the greatest common divisor of [x] and [y].
 * requires: [x] and [y] are positive.
 *)
let rec gcd (x:int) (y:int) : int =
  if x = 0 then y
  else if (x < y) then gcd (y - x) x
  else gcd y (x - y)