Introduction to OCaml

Please read these lab guidelines before you proceed with reading this page.

Starting OCaml

What you see below is a one-star exercise named "start". The exercise ends at the square symbol □.

Exercise: start [✭]

In a terminal window, type utop to start the interactive OCaml session, commonly called the toplevel.
Press Control-D to exit the toplevel. You can also enter #quit;; and press return. Note that you must type the # there: it is in addition to the # prompt you already see.

□

The toplevel is like a calculator or command-line interface. It's similar to DrJava, if you used that in CS 2110, or to the interactive Python interpreter, if you used that in CS 1110. It's handy for trying out small pieces of code without going to the trouble of launching the OCaml compiler. But don't get too reliant on it, because creating, compiling, and testing large programs will require more powerful tools.

Some other languages would call the toplevel a REPL, which stands for read-eval-print-loop: it reads programmer input, evaluates it, prints the result, and then repeats.

Types and values

You can enter expressions into the OCaml toplevel. End an expression with a double semi-colon ;; and press the return key. OCaml will then evaluate the expression, tell you the resulting value, and the value's type. For example:

# 42;;
- : int = 42

Let's dissect that response from utop, reading right to left:

42 is the value.
int is the type of the value.
The value was not given a name, hence the symbol -.

You can bind values to names with a let definition, as follows:

# let x = 42;;
val x : int = 42

Again, let's dissect that response, this time reading left to right:

A value was bound to a name, hence the val keyword.
x is the name to which the value was bound.
int is the type of the value.
42 is the value.

You can pronounce the entire output as "x has type int and equals 42."

Exercise: values [✭]

What is the type and value of each of the following OCaml expressions?

7 * (1+2+3)
"CS " ^ string_of_int 3110

Hint: type each expression into the toplevel and it will tell you the answer. Note: ^ is not exponentiation.

□

OCaml operators

Exercise: operators [✭✭]

Examine the table of all operators in the OCaml manual.

Write an expression that multiplies 42 by 10.
Write an expression that divides 3.14 by 2.0. Hint: integer and floating-point operators are written differently in OCaml.
Write an expression that computes 4.2 raised to the seventh power. Note: there is no built-in integer exponentiation operator in OCaml (nor is there in C, by the way), in part because it is not an operation provided by most CPUs.

□

There are two equality operators in OCaml, = and ==, with corresponding inequality operators <> and !=. Operators = and <> examine structural equality whereas == and != examine physical equality. Until we've studied the imperative features of OCaml, the difference between them will be tricky to explain. (See the documentation of Pervasives.(==) if you're curious now.) But what's important now is that you train yourself only to use = and not to use ==, which might be difficult if you're coming from a language like Java where == is the usual equality operator.

Exercise: equality [✭]

Write an expression that compares 42 to 42 using structural equality.
Write an expression that compares "hi" to "hi" using structural equality. What is the result?
Write an expression that compares "hi" to "hi" using physical equality. What is the result?

□

Exercise: more operators [✭✭, optional]

Familiarize yourself with the rest of the OCaml operators. Write at least one expression with an integer operator, a logical operator, a floating point operator, a comparison (aka "test") operator, and a Boolean operator.

□

Assertions

The expression assert e evaluates e. If the result is true, nothing more happens, and the entire expression evaluates to a special value called unit. The unit value is written () and its type is unit. But if the result is false, an exception is raised.

Exercise: assert [✭]

Enter assert true;; into utop and see what happens.
Enter assert false;; into utop and see what happens.
Write an expression that asserts 2110 is not (structurally) equal to 3110.

□

If expressions

The expression if e1 then e2 else e3 evaluates to e2 if e1 evaluates to true, and to e3 otherwise. We call e1 the guard of the if expression.

# if 3 + 5 > 2 then "yay!" else "boo!";;
- : string = "yay!"

Unlike if-then-else statements that you may have used in imperative languages, if-then-else expressions in OCaml are just like any other expression; they can be put anywhere an expression can go. That makes them similar to the ternary operator ? : that you might have used in other languages.

# 4 + (if 'a' = 'b' then 1 else 2);;
- : int = 6

Exercise: if [✭]

Write an if expression that evaluates to 42 if 2 is greater than 1 and otherwise evaluates to 7.

□

If expressions can be nested in a pleasant way:

if e1 then e2
else if e3 then e4
else if e5 then e6
...
else en

You should regard the final else as mandatory, regardless of whether you are writing a single if expression or a highly nested if expression. If you leave it off you'll likely get an error message that, for now, is inscrutable:

# if 2>3 then 5;;
Error: This expression has type int but an expression was expected of type unit

Functions

A function can be defined at the toplevel using syntax like this:

# let increment x = x+1;;
val increment : int -> int = <fun>

Let's dissect that response:

increment is the identifier to which the value was bound.
int -> int is the type of the value. This is the type of functions that take an int as input and produce an int as output. Think of the arrow -> as a kind of visual metaphor for the transformation of one value into another value—which is what functions do.
The value is a function, which the toplevel chooses not to print (because it has now been compiled and has a representation in memory that isn't easily amenable to pretty printing). Instead, the toplevel prints <fun>, which is just a placeholder to indicate that there is some unprintable function value. Important note: <fun> itself is not a value.

You can "call" functions with syntax like this:

# increment 0;;
- : int = 1
# increment(21);;
- : int = 22
# increment (increment 5);;
- : int = 7

But in OCaml the usual vocabulary is that we "apply" the function rather than "call" it.

Note how OCaml is flexible about whether you write the parentheses or not, and whether you write whitespace or not. One of the challenges of first learning OCaml can be figuring out when parentheses are actually required. So if you find yourself having problems with syntax errors, one strategy is to try adding some parentheses.

Exercise: double fun [✭]

Using the increment function from above as a guide, define a function double that multiplies its input by 2. For example, double 7 would be 14. Test your function by applying it to a few inputs. Turn those test cases into assertions.

□

Exercise: more fun [✭✭]

Define a function that computes the cube of a floating-point number. Test your function by applying it to a few inputs.
Define a function that computes the sign (1, 0, or -1) of an integer. Use a nested if expression. Test your function by applying it to a few inputs.

□

A function that take multiple inputs can be defined just by providing additional names for those inputs as part of the let definition. For example, the following function computes the average of three arguments:

let avg3 x y z =
  (x +. y +. z) /. 3.

Exercise: date fun [✭✭✭]

Define a function that takes an integer d and string m as input and returns true just when d and m form a valid date. Here, a valid date has a month that is one of the following abbreviations: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sept, Oct, Nov, Dec. And the day must be a number that is between 1 and the minimum number of days in that month, inclusive. For example, if the month is Jan, then the day is between 1 and 31, inclusive, whereas if the month is Feb, then the day is between 1 and 28, inclusive.

How terse (i.e., few and short lines of code) can you make your function? You can definitely do this in fewer than 12 lines.

□

Storing code in files

Using OCaml as a kind of interactive calculator can be fun, but we won't get very far with writing large programs that way. We need to store code in files instead.

Exercise: command line [✭]

Exit the toplevel. Change to your home directory (called ~) using the cd command:

$ cd ~

The $ above indicates the command prompt: you don't actually type it yourself.

List the files in that directory using the ls command:

$ ls

Create a directory called labs using the mkdir command:

$ mkdir labs

Change to that directory:

$ cd labs

□

Exercise: edit, compile, and run [✭✭]

After completing the command line exercise, above, create a file called hello.ml using a text editor. If you're on the VM, launch Atom:

$ atom hello.ml

Atom is a text editor that provides excellent integration with OCaml, including syntax highlighting, auto-completion, and auto-indentation. We recommend that you give it a try. If you are running on the VM instead of natively, it's possible that your hardware might cause Atom to run too slowly to be useful, in which case you can try a different editor, or try to get a native installation working.

Other choices of editors include Sublime, Komodo, Emacs, and Vim, all of which are installed already for you on the VM. Sublime and Komodo provide less integration with OCaml, but still have a modern look and feel. Emacs provides the most sophisticated integration with OCaml, but the editor itself comes with a substantial learning curve, and is not as graphical. Vim is beloved by some for its minimality.

Enter the following code into the file:

print_endline "Hello world!"

Important note: there is no double semicolon ;; at the end of that line of code. The double semicolon is strictly for interactive sessions in the toplevel, so that the toplevel knows you are done entering a piece of code. There's no reason to write it in a .ml file, and we consider it mildly bad style to do so.

Save the file and return to the command line. Compile the code:

$ ocamlc -o hello.byte hello.ml

The compiler is named ocamlc. The -o hello.byte option says to name the output executable hello.byte. The executable contains compiled OCaml bytecode. In addition, two other files are produced, hello.cmi and hello.cmo. We don't need to be concerned with those files for now. Run the executable:

$ ./hello.byte

It should print Hello world! and terminate.

Now change the string that is printed to something of your choice. Save the file, recompile, and rerun.

This edit-compile-run cycle between the editor and the command line is something that might feel unfamiliar if you're used to working inside IDEs like Eclipse. Don't worry; it will soon become second nature.

□

Exercise: build [✭✭]

Running the compiler directly is good to know how to do, but in larger projects, we want to use the OCaml build system to automatically find and link in libraries. Let's try using it:

$ ocamlbuild hello.byte

You will get an error from that command. Don't worry; just keep reading this exercise.

The build system is named ocamlbuild. The file we are asking it to build is the compiled bytecode hello.byte. The build system will automatically figure out that hello.ml is the source code for that desired bytecode.

However, the build system likes to be in charge of the whole compilation process. When it sees leftover files generated by a direct call to the compiler, as we did in the previous exercise, it rightly gets nervous and refuses to proceed. If you look at the error message, it says that a script has been generated to clean up from the old compilation. Run that script, and also remove the compiled file:

$ _build/sanitize.sh
$ rm hello.byte

After that, try building again:

$ ocamlbuild hello.byte

That should now succeed. There will be a directory _build that is created; it contains all the compiled code. That's one benefit of the build system over directly running the compiler: instead of polluting your source directory with a bunch of generated files, they get cleanly created in a separate directory. There's also a file hello.byte that is created, and it is actually just a link to "real" file of that name, which is in the _build directory.

Now run the executable:

$ ./hello.byte

You can now easily clean up all the compiled code:

$ ocamlbuild -clean

That removes the _build directory and hello.byte link, leaving just your source code.

From now on, we'll use the build system rather than directly invoking the compiler.

□

Exercise: editor tutorial [✭✭✭]

Which editor you use is largely a matter of personal preference. Atom, Sublime, and Komodo all provide a modern GUI. Emacs and Vim are more text-based. If you've never tried Emacs or Vim, why not spend 10 minutes with each? There are good reasons why they are beloved by many programmers.

To get started with learning Vim, run vimtutor -g.
To get started with learning Emacs, run emacs then press C-h t, that is, Control+H followed by t.

□

Exercise: master an editor [✭✭✭✭✭, advanced]

You'll be working on this exercise for the rest of your career! Try not to get caught up in any editor wars.

□