Variable declarations in OCaml bind variables within a scope, the
part of the program where the variable standard for the value it is bound to.
For example, when we write
let
x = e1
in
e2,
the scope of the identifier x is the expression
e2. Within
that scope, the identifier x stands for whatever value
v the expression e1
evaluated to.
Since x = v, OCaml evaluates the
let
expression by rewriting it to
e2, but with the value
v substituted for the occurrences of x.
For example, the expression let x = 2 in x+3
is
evaluated to 2+3
, and then arithmetic is used to obtain
the result value 5
.
Functions also bind variables. When we write a function definition in OCaml, we introduce new variables for the function name and for function arguments. For example, in this expression two variables are bound:
let f(x) = e1 in e2
The scope of the formal parameter x is exactly the expression e1. The scope of the variable f (which is bound to a function value) is the body of the let, e2.
A let
expression can introduce multiple variables at once,
as in the following example:
let x = 2 and y = 3 in x + y
Here both x
and y
have the body of the let
as their scope. Even though y
is declared after x
,
the definition of y
cannot refer to the variable
x
—it isn't in scope.
To declare a recursive function, the function must be in scope within its
own body. In OCaml, this requires using a let rec
instead of
a let
. With let rec
, every variable it declares
is in scope within its own definition and within the definitions of all the
other variables. To make this work, all the definitions that use these variables
must be functions. For example, here is how we could define
mutually recursive functions even
and odd
:
let rec even(x) = (x = 0 || odd(x-1)) and odd(x) = not(x = 0 || not(even(x-1))) in odd(3110)
There are two variables named x
in this example, both of which
are in scope only within the respective functions that bind them. However,
the variables even
and odd
are in scope in each
other's definitions and within the body of the let
.
It is possible to name things defined in a module without using a
qualified identifier, using the open
expression:
# String.length "hi";; - : int = 2 # open String;; # length "bye";; - : int = 3
There are a number of
pre-defined library modules provided by OCaml that are extremely useful.
For instance, the String
module provides a number of useful operations
on strings, and the List
module provides operations on lists. Many useful operations are
in the Pervasives
module, which is already open by default.
To find out more about the OCaml libraries and the operations they
provide, see
the Objective Caml Reference Manual, part IV.
For example, there is a built-in operation for calculating the absolute value
of an integer called Pervasives.abs
, which can be named simply
as abs
.
Take some time to browse through the libraries and find out what they provide. You shouldn't recode something that's available in the library (unless we ask you to do so explicitly.)
We saw that a function with multiple parameters is really just syntactic
sugar for a function that is passed a tuple as an argument. For example,
let plus(x,y) = x+y
is sugar for the function
let plus(z: int*int) = (match z with (x,y) -> x+y)
, which
is sugar for
let plus = fun(z: int*int) -> (match z with (x,y) -> x+y)
.
When we apply this function, say to the tuple (2,3)
,
evaluation proceeds as follows:
plus(2,3) = (fun(z: int*int) -> (match z with (x,y) -> x+y)) (2,3) = (match (2,3) with (x,y) -> x+y) = 2+3 = 5
It turns out that OCaml has another way to declare functions with multiple formal arguments, and in fact it is the usual way. The above declaration can be given in curried form as follows:
let plus x y = x + y
or with all the types written explicitly:
let plus (x:int) (y:int) :int = x + y
Notice that there is no comma between the parameters. Similarly, when applying a curried function, we write no comma:
plus 2 3 = 2+3 = 5
There is more going on here than it might seem. Recall we said that functions
really only have one argument. When we write plus 2 3
, the
function plus
is only being passed one argument, the number 2.
We can parenthesize the term as (plus 2) (3)
, because application
is left-associative. In other words,
plus 2
must return a function that can be applied to 3 to obtain
the result 5. In fact, plus 2
returns a function that adds 2 to
its argument.
How does this work? The curried declaration above is syntactic sugar for the creation of a higher-order function. It stands for:
let plus = function (x:int) -> function (y:int) -> x + y
Evaluation of plus 2 3
proceeds as follows:
plus 2 3 = ((function (x:int) -> function (y:int) -> x + y) 2) 3 = (function (y:int) -> 2 + y) 3 = 2 + 3
So plus
is really a function that takes in an int
as an argument, and returns a new function of type int->int
.
Therefore, the type of plus
is int->(int->int)
.
We can write this simply as int->int->int
, because the
type operator ->
is right-associative.
It turns out that we can view binary operators like +
as
functions, and they are curried just like plus
:
# (+);; - : int->int->int = <fun> # (+) 2 3;; - : int = 5 # let next = (+) 1;; val next : int -> int = <fun> # next 7;; - : int = 8;
So far the only real data structures we can build are made of
tuples. But tuples don't let us make data structures whose
size is not known at compile time. For that we need a new language
feature.
One simple data structure that we're used to is singly linked lists.
It turns out that OCaml
has lists built in. For example, in OCaml the expression []
is an
empty list. The expression [1;2;3]
is a list containing three
integers. Another name for the empty list is nil
. This is just the same thing as []
.
In OCaml, all the elements of a list have to have the same type. For
example, a list of integers has the type int list
.
Similarly, the list ["hi"; "there"; "312"]
would have the type
string list
. But [1; "hi"]
is not a legal
list. Lists in OCaml are homogeneous lists, as opposed to
heterogeneous lists in which each element can have a different
type.
Lists are immutable: you cannot change the elements of a list, unlike an array in Java. Once a list is constructed, it never changes.
Often we want to make a list out of smaller lists. We can concatenate
two lists with the @
operator. For example, [1;2;3] @
[4;5]
= [1;2;3;4;5]
. However, this operator
isn't very fast because it needs to build a copy of the entire first
list. (It doesn't make a copy of the second list because the storage of the
second list is shared with the storage of the concatenated list.)
More often when building up lists
we use the ::
operator, which prepends an
element to the front of an existing list (“prepend” means “append onto the front”).
The expression 1::[2;3]
is 1
prepended onto
the list [2;3]
. This is just the list [1;2;3]
.
If we use ::
on the empty list, it makes a one-element list:
1::[]
= [1]
.
For historical reasons going back to the language Lisp, we usually
call the ::
operator “cons”.
The fact that lists are immutable is in keeping with OCaml being a
functional language. It is also actually useful for making OCaml more efficient,
because it means that different list data structures can share parts of their
representation in the computer's memory. For example, evaluating h::t
only requires allocating space for a single extra list node in the computer's
memory. It shares the rest of the list with the existing list t
.
The best way to extract elements from a list is to use pattern matching.
The operator ::
and the bracket constructor can be used as
patterns in a match
expression. For example, if we wanted to
get the value 1 if we had a list of one element, and zero if we had an
empty list, we could write:
match lst with [] -> 0 | [x] -> 1
Here, x
would be bound to the single element of the list if
the second match arm were evaluated.
Often, functions that manipulate lists are recursive, because they need
to do something to every element. For example, suppose that we wanted to
compute the length of a list of strings. We could write a recursive function
that accomplishes this (in fact, the library function List.length
does just this):
(* Returns the length of lst *) fun length(lst: string list): int = match lst with [] -> 0 | h::t -> 1 + length(t)
The logic here is that if a list is empty ([]
),
its length is clearly zero. Otherwise, if it is the appending of an
element h onto another list t, its length must be one greater than the
length of t.
It's possible to write patterns using the bracket syntax. This is
exactly the same as writing a similar pattern using the ::
operator. For example, the following patterns are all equivalent:
[x;2]
, x::2::nil
, x::2::[]
,
x::[2]
. These expressions are also all equivalent when
used as terms.
The OCaml structure
List
contains many useful functions for
manipulating lists. Before using lists, it's worth taking a look.
Some of them we'll talk about later in more detail. Two functions
that should be used with caution are hd
and tl
.
These functions get the head and tail of a list, respectively. However,
they raise an exception if applied to an empty list. They make it
easy to forget about the possibility that the list might be empty, creating
expected exceptions that crash your program. So it's usually best to avoid them.
We can use pattern matching to implement other useful functions on lists. Suppose we wanted a function that would extract a list element by its index within the list, with the first element at index zero. We can implement this neatly by doing a pattern match on the list and the integer n at the same time:
(* nth lst n is the nth element of lst. *) fun nth(lst: string list) (n: int) : int = match lst with h::t -> if n=0 then h else nth(t, n-1)