Modules and Printing
We'll start by learning about how printing works in OCaml. Then we'll use the OCaml module system to implement some queues and dictionaries. Finally, we'll see how compilation of modules interacts with utop and learn some new utop directives related to modules.
Printing
The Pervasives
module has built-in printing functions for several of
the built-in primitive types: print_char
, print_int
, print_string
,
and print_float
. There's also a print_endline
function, which is
like print_string
but also outputs a newline.
Exercise: hello world [✭]
Type the following in utop:
# print_endline "Hello world!";;
# print_string "Hello world!";;
Notice the difference in output from each.
□
Let's look at the type of those functions:
# print_endline;;
- : string -> unit = <fun>
# print_string;;
- : string -> unit = <fun>
They both take a string as input and return a value of type unit
,
which we haven't seen before. There is only one value of this type,
which is written ()
and is also pronounced "unit". So unit
is like
bool
, except there is one fewer value of type unit
than there is of
bool
. Unit is therefore used when you need to take an argument or
return a value, but there's no interesting value to pass or return. Unit
is often used when you're writing or using code that has side effects.
Printing is an example of a side effect: it changes the world and can't
be undone.
If you want to print one thing after another, you could sequence some print functions using nested let expressions:
let () = print_endline "THIS" in
let () = print_endline "IS" in
print_endline "3110"
The ()
in let () =
is a pattern, the unit pattern, which matches
only the unit value. The entire expression evaluates to whatever
print_endline 3110
returns, which is the unit value ()
, hence the
entire expression has type unit
. But the evaluation has the side
effect of printing THIS IS 3110
over multiple lines of output.
The boilerplate of all the let () = ... in
above is annoying to have
to write! We don't really care about matching the unit value; after
all, there's only one thing it could be. So there's a special syntax
that can be used to chain together multiple functions who return unit.
The expression e1; e2
first evaluates e1
, which should evaluate to
()
, then discards that value, and evaluates e2
. So we could rewrite
the above code as:
print_endline "THIS";
print_endline "IS";
print_endline "3110"
And that is far more idiomatic code.
If e1
does not have type unit
, then e1;e2
will give a warning, because
you are discarding useful values. If that is truly your intent, you can call
the built-in function ignore : 'a -> unit
to convert any value to ()
:
# 3; 5;;
Warning 10: this expression should have type unit.
- : int = 5
# ignore 3; 5;;
- : int = 5
Exercise: print int list rec [✭✭]
Write a function print_int_list : int list -> unit
that
prints its input list, one number per line. For example,
print_int_list [1;2;3]
should result in this output:
1
2
3
Here is some code to get you started:
let rec print_int_list = function
| [] -> ()
| h::t -> (* fill in here *);
print_int_list t
□
Exercise: print int list iter [✭✭]
Write a function print_int_list' : int list -> unit
whose
specification is the same as print_int_list
. Do not
use the keyword rec
in your solution, but instead to use
the List module function List.iter
.
Here is some code to get you started:
let print_int_list' lst =
List.iter (fun x -> (* fill in here *)) lst
□
Queues
Create a file named myqueue.ml
. In it, put the following code:
module ListQueue = struct
type 'a queue = 'a list
let empty = []
let is_empty q = (q = [])
let enqueue x q = q @ [x]
let peek = function
| [] -> failwith "Empty"
| x::_ -> x
let dequeue = function
| [] -> failwith "Empty"
| _::q -> q
end
Launch utop and #use "myqueue.ml"
. You will then be able to
create and use ListQueue
, for example:
# let q = ListQueue.enqueue 1 (ListQueue.enqueue 2 ListQueue.empty);;
# ListQueue.peek q;; (* returns 2, because 2 was enqueued first *)
Exercise: big list queue [✭✭]
Use the following code to create ListQueue
's of exponentially
increasing length: 10, 100, 1000, etc. How big of a queue can you create before
there is a noticeable delay? How big until there's a delay of at least 10 seconds?
(Note: you can abort utop computations with Ctrl-C.)
(* Creates a ListQueue filled with [n] elements. *)
let fill_listqueue n =
let rec loop n q =
if n=0 then q
else loop (n-1) (ListQueue.enqueue n q) in
loop n ListQueue.empty
□
Add the following code to myqueue.ml
:
module TwoListQueue = struct
type 'a queue = {front:'a list; back:'a list}
let empty = {front=[]; back=[]}
let is_empty = function
| {front=[]; back=[]} -> true
| _ -> false
let norm = function
| {front=[]; back} -> {front=List.rev back; back=[]}
| q -> q
let enqueue x q = norm {q with back=x::q.back}
let peek = function
| {front=[]; _} -> None
| {front=x::_; _} -> Some x
let dequeue = function
| {front=[]; _} -> None
| {front=_::xs; back} -> Some (norm {front=xs; back})
end
Exercise: big two-list queue [✭✭]
Use the following function to again create queues of exponentially increasing length:
let fill_twolistqueue n =
let rec loop n q =
if n=0 then q
else loop (n-1) (TwoListQueue.enqueue n q) in
loop n TwoListQueue.empty
Now how big of a queue can you create before there's a delay of at least 10 seconds?
□
If we compare ListQueue
and TwoListQueue
, it's hopefully
clear that ListQueue
is a simple and correct implementation of
a queue data structure. It's probably far less clear that TwoListQueue
is
a correct implementation! For more details about these two implementations,
look at the code associated with this lecture and lab.
One of the additional exercises at the end of this lab asks you to
explore the efficiencies of these two implementations a bit further.
Dictionaries
A dictionary maps keys to values. This data structure typically supports at least a lookup operation that allows you to find the value with a corresponding key, an insert operation that lets you create a new dictionary with an extra key included. And there needs to be a way of creating an empty dictionary.
We might represent this in a Dictionary
module type as follows:
module type Dictionary = sig
type ('k, 'v) t
(* The empty dictionary *)
val empty : ('k, 'v) t
(* [insert k v d] produces a new dictionary [d'] with the same mappings
* as [d] and also a mapping from [k] to [v], even if [k] was already
* mapped in [d]. *)
val insert : 'k -> 'v -> ('k,'v) t -> ('k,'v) t
(* [lookup k d] returns the value associated with [k] in [d].
* raises: [Not_found] if [k] is not mapped to any value in [d]. *)
val lookup : 'k -> ('k,'v) t -> 'v
end
Note how the type Dictionary.t
is parameterized on two types, 'k
and 'v
,
which are written in parentheses and separated by commas. Although ('k,'v)
might look like a pair of values, it is not: it is a syntax for writing
multiple type variables.
We have seen already in this class that an association list can be used as a simple implementation of a dictionary. For example, here is an association list that maps some well-known names to an approximation of their numeric value:
[("pi", 3.14); ("e", 2.718); ("phi", 1.618)]
Let's try implementing the Dictionary
module type with a module
called AssocListDict
.
Exercise: AssocListDict [✭✭]
Put the code for Dictionary
above into a file named dict.ml
.
Put this code after it:
module AssocListDict = struct
type ('k, 'v) t = ('k * 'v) list
let empty = failwith "Unimplemented"
let insert k v d = failwith "Unimplemented"
let lookup k d = failwith "Unimplemented"
end
Finish the implementation of each of those three definitions.
For lookup
, use the library function List.assoc
.
□
Exercise: AssocListDict encapsulated [✭✭]
Launch utop and type:
# #use "dict.ml";;
# open AssocListDict;;
# let d = insert 1 "one" empty;;
The response you get should be:
val d : (int * string) list = [(1, "one")]
Now close utop. Change the first line of your implementation of
AssocListDict
in dict.ml
to the following:
module AssocListDict : Dictionary = struct
Restart utop and repeat the three phrases above (use,open,let). What is the response you get now? Explain in your own words why the response changed. Your answer should incorporate "abstract type" and "encapsulation".
□
Modules and utop
There are several pragmatics with modules and the toplevel that you need to understand to use them together effectively. The following sequence of exercises, which should be done in order, is designed to help you navigate some of the common pitfalls and confusions that come up.
Compiling an OCaml file produces a module having the same name as the file (but
with the first letter capitalized). These compiled modules can be loaded into
the toplevel using #load
.
Exercise: load [✭]
Create a file called mods.ml
. Inside that file, put the following code:
let b = "bigred"
let inc x = x+1
module M = struct
let y = 42
end
At the command line, type ocamlbuild mods.byte
to compile it. Type
ls _build
to see the files that ocamlbuild
produced. One of them
is mods.cmo
: this is a compiled module object file,
aka bytecode.
Now run utop
, and give it this directive (recall that the #
character is
required in front of a directive, it is not part of the prompt):
# #directory "_build";;
That tells utop to add the _build
directory to the path in which it
looks for compiled (and source) files.
Now give utop this directive:
# #load "mods.cmo";;
There is now a module named Mods
available to be used. Try these
expressions in utop:
# Mods.b;;
# Mods.M.y;;
Now try these:
# inc;; (* will fail *)
# open Mods;;
# inc;;
Finish by exiting utop.
□
If you are doing a lot of testing of a particular module, it can be
annoying to have to type those directives (#directory
and #load
)
every time you start utop. The solution is to create a file
in the working directory and call that file .ocamlinit
. Note
that the .
at the front of that filename is required and makes
it a hidden file that won't appear in directory listings
unless explicitly requested (e.g., with ls -a
). Everything
in .ocamlinit
will be processed by utop when it loads.
Exercise: ocamlinit [✭]
Using your text editor, create a file named .ocamlinit
in the same
directory as mods.ml
. In that file, put the following:
#directory "_build";;
#load "mods.cmo";;
open Mods
Now restart utop. All the names defined in Mods
will already be in scope.
For example, these will both succeed:
# inc;;
# M.y;;
□
If you are building code that depends on third-party libraries, you can load those libraries with another directive:
Exercise: require [✭]
Add the following lines to the end of mods.ml
:
open OUnit2
let test = "testb" >:: (fun _ -> assert_equal "bigred" b)
Try to recompile the module with ocamlbuild mods.byte
. That
will fail, because you need to tell the build system to include
the third-party library OUnit. So try to recompile with
ocamlbuild -pkg oUnit mods.byte
. That will succeed.
Now try to restart utop. If you look closely, there will be an error message:
File ".ocamlinit", line 1:
Error: Reference to undefined global `OUnit2'
The problem is that the OUnit library hasn't been loaded into utop yet. Type the following directive:
#require "oUnit";;
Now you can successfully load your own module without getting an error.
#load "mods.cmo";;
Quit utop. Add the #require
directive to .ocamlinit
anywhere before the
#load
directive. Restart utop. Verify that inc
is in scope.
□
When compiling a file, the build system automatically figures out which other files it depends on, and recompiles those as necessary. The toplevel, however, is not as sophisticated: you have to make sure to load all the dependencies of a file, as we'll see in the next exercise.
Exercise: loads [✭]
Create a file mods2.ml
. In it, put this code:
open Mods
let x = inc 0
Run ocamlbuild -pkg oUnit mods2.byte
. Notice that you don't have to
name mods.byte
, even though mods2.ml
depends on the module Mod
.
The build system is smart that way.
Go back to .ocamlinit
and change it be to just the following:
#directory "_build";;
#require "oUnit";;
Restart utop. Try this directive:
# #load "mods2.cmo";;
Error: Reference to undefined global `Mods'
The toplevel did not automatically load the modules that Mods2
depends upon.
You can do that manually:
# #load "mods.cmo";;
# #load "mods2.cmo";;
Or you can tell the toplevel to load Mods2
and recursively to load everything
depends on:
# #load_rec "mods2.cmo";;
□
There is a big difference between #load
-ing a compiled module file and #use
-ing
an uncompiled source file. The former loads bytecode and makes it available for use.
For example, loading mods.cmo
caused the Mod
module to be available,
and we could access its members with expressions like Mod.b
.
The latter (#use
) is textual inclusion: it's like typing the contents of
the file directly into the toplevel. So using mods.ml
does not cause
a Mod
module to be available, and the definitions in the file
can be accessed directly, e.g., b
. Let's try that out...
Exercise: load vs. use [✭]
Start a new toplevel, first making sure your .ocamlinit
does not contain any
#load
directives. We already know from the previous exercise that
we could load first mods.cmo
then mods2.cmo
. Let's try using the source
code:
# #use "mods.ml" (* will succeed *)
# b;;
val b : string = "bigred"
# #use "mods2.ml" (* will fail *)
Error: Reference to undefined global `Mods'
The problem is that #use
-ing mods.ml
did not introduce a module into scope.
Rather it directly put the definitions found in that source file into scope.
□
So when you're using the toplevel to experiment with your code, it's often
better to work with #load
, because this accurately reflects how your modules
interact with each other and with the outside world, rather than #use
them.
Additional exercises
Exercise: queue efficiency [✭✭✭]
Compare the implementations of enqueue
in ListQueue
vs.
TwoListQueue
. Explain in your own words why the efficiency of
ListQueue.enqueue
is linear time in the length of the queue. Hint:
consider the @
operator. Then explain why adding n elements to
the queue takes time that is quadratic in n.
Now consider TwoListQueue.enqueue
. Suppose that the queue is in a
state where it has never had any elements dequeued. Explain in your own
words why TwoListQueue.enqueue
is constant time. Then explain why
adding n elements to the queue takes time that is linear in
n.
(Note: the enqueue and dequeue operations for TwoListQueue
remain
constant time even after interleaving them, but showing why that is so
require the study of amortized analysis, which we will not cover here.)
□
Exercise: binary search tree dictionary [✭✭✭]
Write a module BstDict
that implements the Dictionary
module type using the tree
type and lookup
and insert
functions that you defined in the lab on variants.
□
Exercise: fraction [✭✭✭✭]
Write a module that implements the Fraction
module type below:
module type Fraction = sig
(* A fraction is a rational number p/q, where q != 0.*)
type t
(* [make n d] is n/d. Requires d != 0. *)
val make : int -> int -> t
val numerator : t -> int
val denominator : t -> int
val toString : t -> string
val toReal : t -> float
val add : t -> t -> t
val mul : t -> t -> t
end
Your module should ensure these invariants for every value v
of type t
that it returns from make
, add
, and mul
:
v
is in reduced formthe denominator of
v
is positive
For the first invariant, you might find this implementation of Euclid's algorithm to be helpful:
(* [gcd x y] is the greatest common divisor of [x] and [y].
* requires: [x] and [y] are positive.
*)
let rec gcd (x:int) (y:int) : int =
if x = 0 then y
else if (x < y) then gcd (y - x) x
else gcd y (x - y)