By convention, data constructors (Penny
, Nickel
, etc. in the
example below) start with capital letters.
There are a few exceptions, such as true
and false
.
type coin = Penny | Nickel | Dime | Quarter
By convention, variables (value
and c
in the example) start
with lower case letters.
let value (c : coin) : float = match c with Penny -> 0.01 | Nickel -> 0.05 | Dime -> 0.10 | Quarter -> 0.25
The example above is a typical match
expression. But consider the following variant. It looks like it should be the same, but when we compile this function, OCaml complains about redundant patterns.
let bad_value (c : coin) : float = let penny = Penny in match c with penny -> 0.01 | Nickel -> 0.05 | Dime -> 0.10 | Quarter -> 0.25
Why? After all, isn't this equivalent to the following?
let bad_value2 (c : coin) : float = let penny = Penny in if c = penny then 0.01 else if c = Nickel then 0.05 else if c = Dime then 0.10 else if c = Quarter then 0.25 else raise (Failure "impossible!")
No! Actually, it's more like
let bad_value2 (c : coin) : float = let penny = Penny in match c with random_variable_name -> 0.01 | Nickel -> 0.05 | Dime -> 0.10 | Quarter -> 0.25
or even
let bad_value3 (c : coin) : float = let penny = Penny in match c with _ -> 0.01 | Nickel -> 0.05 | Dime -> 0.10 | Quarter -> 0.25
In a match expression, if there is a data constructor C on the
left-hand side of the ->
in one of the patterns, then we are
comparing to see if the value of the expression e we are trying to match is
C. But if it is a variable name,
then we're declaring a new, fresh instance of the variable and binding it to
the value of e. So any patterns below this are redundant, because this
match will never fail.
An occurrence of an identifier in an expression
can be either a binding occurrence or a use occurrence.
For example, the following are all examples of binding occurrences of the
identifier id
:
let id = e (* a value declaration *) let id = e in ... (* a value declaration *) let id (arg : s) : t = e (* a function declaration *) let id (arg1 : s1) (arg2 : s2) : t = e (* a function declaration *) match e with id -> ... (* a pattern match *)
In contrast, just about anything else, for example
if id = e then ... else ... id + 3
is a use occurrence of id
. In these occurrences, id
is evaluated, and its value is the value that was bound to it
in the nearest binding occurrence of id
, be it as a function argument,
function declaration, value declaration, or pattern match.
In the example
let bad_value (c : coin) : float = let penny = Penny in match c with penny -> 0.01 | Nickel -> 0.05 | Dime -> 0.10 | Quarter -> 0.25
in the first pattern of the match,
OCaml does not look to see that penny
was previously
bound to Penny
. The occurrence
of penny
in the match expression is also a binding occurrence.
The match will succeed, and it will bind penny
anew to the value of c
.
Why do things this way? The most important reason is that binding identifiers in patterns is an extremely useful device that allows concise, elegant code. It is possible to simultaneously bind several identifiers in one pattern. For example,
match e with Add (x, y) :: t -> x + y | ...
simultaneously binds the three identifiers x
, y
,
and t
, which can then be used in the
expression on the right-hand side. If you do not need a value, use the
wildcard _
in the pattern, which matches anything. For example,
match e with Add (x, y) :: _ -> x + y | ...
If we wanted to allow use occurrences of identifiers in patterns, we would need a way to distinguish them from binding occurrences. Currently there is no way to do this in OCaml.
Here is a puzzle to illustrate the difference between binding and use occurrences. What is the value of the following expression?
(*1*) let f ((x : int), (y : int)) : int = (*2*) let x = y in (*3*) let y = x in (*4*) let (y, x) = (x, y * y) in (*5*) match (y, x) with (*6*) (x, 1) -> 0 (*7*) | (x, y) -> x (*8*) in f (2, 3)
Figuring this out isn't easy, but here's how to
think about it. Let's refer to the values bound to the different
binding occurrences of the variables by their line numbers.
So x1 means the value bound to x
in line 1.
f
,
x
, and
y
. The identifiers x
, and
y
are bound to the arguments to f
supplied in line 8. So x1 = 2 and y1 = 3. The
identifier f
is bound to the function whose
body is given in lines 2–7.x
and
a use occurrence of y
. So x2 = y1 = 3.y
and
a use occurrence of x
. The closest
binding of x
is in line 2. So y3 = x2 = 3.x
and y
to the left of the =
and use occurrences of both identifiers to the right
of the =
. The use occurrences take their values
from the most recent previous bindings. So y4 = x2 = 3
and x4 = y3 * y3 = 9.x
and y
. The values are x4 = 9 and y4 = 3, respectively. It initiates a pattern match on the tuple (y4, x4) = (3, 9).->
is x7 = 3,
which is also the value of the whole expression.Because lists are so useful, OCaml provides a builtin parameterized list
type called list
. It acts just like the List
type that we defined in lecture except that the names of the constructors
are changed. The constructor []
makes an empty list (compare to Nil
)
and the constructor ::
builds a new list by prepending a first
element to another list (compare to Cons
). It is as if
list
were declared as:
type 'a list = [] | :: of 'a * 'a list
(although we can't really do this because of OCaml naming
conventions). The constructor ::
is an infix operator, which is notationally
convenient.
The OCaml interpreter knows how to print out lists nicely as well. The empty
list is printed as []
, and nonempty lists are printed using
brackets with semicolon-separated items. These forms may be used to write
lists as well. Note that []
is a polymorphic value of
type 'a list
; it serves as the empty list for all types T list
.
Here are some examples that show how lists work:
[];; - : 'a list = [] let it = 2 :: [];; val it : int list = [2] let both = 1 :: it;; val both : int list = [1; 2] let both2 = match both with x :: lst -> lst | [] -> [];; val both2 : int list = [2] let both3 = match both2 with x :: lst -> lst | [] -> [];; (* we don't "recover polymorphism" here; it would be unsafe in general *) val both3 : int list = [] both = 1 :: 2 :: [];; (* we can test lists for equality if we can test their elements *) - : bool = true match both with [x; y] -> x + y (* we can use bracket notation in patterns *) | _ -> 0;; - : int = 3 [[]];; - : 'a list list = [[]]
Just like with types, we have to make sure that we write exhaustive
patterns in match
expressions:
match ["hello"; "goodbye"] with s :: _ -> s ^ " hello";; Warning P: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: []
Built-in lists come with lots of useful predefined OCaml library functions, such as the following and many more:
val List.length : 'a list -> int val @ : ('a list * 'a list) -> 'a list (* append two lists *) val List.hd : 'a list -> 'a val List.tl : 'a list -> 'a list val List.nth : ('a list * int) -> 'a
Of course, all of these functions could also be easily implemented for the List
type that we defined ourselves.
We saw two related features of OCaml in class:
Polymorphic values are typically function values, but other polymorphic values
exist, such as []
(and also Nil
, as we defined it).
Datatypes can actually be parameterized with respect to multiple type
parameters. For example, the following type, ortype
, is a type-level
function that accepts a pair of types and yields a new type:
# type ('a, 'b) ortype = Left of 'a | Right of 'b | Both of 'a * 'b;; type ('a, 'b) ortype = Left of 'a | Right of 'b | Both of 'a * 'b # Left 2;; - : (int, 'a) ortype = Left 2 # Right "hello";; - : ('a, string) ortype = Right "hello" # Both (true, 'a');; - : (bool, char) ortype = Both (true, 'a')
Note that the values Left 2
and Right "hello"
are still polymorphic with respect to one type. OCaml always starts counting
type variables from 'a
, hence the - : (int, 'a) ortype
rather than
- : (int, 'b) ortype
in the match for Left 2
above.
Another important standard parameterized type is option
,
which represents the possible presence of a value. It is defined as
follows:
type 'a option = Some of 'a | None
Options are commonly used when no useful value of a particular type makes
sense. This corresponds to some uses of the value null
in Java (None
acts like null
). The difference is that the use of option
is type-safe.
There is no danger of a runtime null pointer exception as long as you use pattern-matching with
option
, because the type system forces you to account for the possibility
of None
.
Pattern matching should be used to check for and extract the value. A
more detailed description of option is available in the OCaml Library
documentation.