CS 3110 Lecture 4
Variant types and polymorphism

Variant types

Lists are very useful, but it turns out they are not really as special as they look. We can implement our own lists, and other more interesting data structures, such as binary trees.

In recitation you should have seen some simple examples of variant types sometimes known as algebraic datatypes or just datatypes. Variant types provide some needed power: the ability to have a variable that contains more than one kind of value.

Unlike tuple types and function types, but like record types, variant types cannot be anonymous; they must be declared with their names. Suppose we had a variable that could be contain yes, no, or maybe. Its type could be declared as a variant type:

# type answer = Yes | No | Maybe;;
type answer = Yes | No | Maybe
# let x: answer = Yes;;
val x: ans = Yes

The variant type is declared with a set of constructors that describe the possible ways to make a value of that type. In this case, we have three constructors: Yes, No, and Maybe. Constructor names must start with a capital letter, and all other names in OCaml cannot start with a capital.

The different constructors can also carry values with them. For example, suppose we want a type that can either be a 2D point or a 3D point. It can be declared as follows:

type eitherPoint = TwoD of float * float
                 | ThreeD of float * float * float

Some examples of values of type eitherPoint are: TwoD(2.1, 3.0) and ThreeD(1.0, 0.0, -1.0).

Suppose we have a value of type eitherPoint. We need to find out which constructor it was made from in order to get at the point data inside. This can be done using matching:

let p: eitherPoint = ...
in
    match p with
      TwoD(x, y) -> ...
    | ThreeD(x, y, z) -> ...

Variant type syntax

We use X as a metavariable to represent the name of a constructor, and T to represent the name of a type. Optional syntactic elements are indicated by brackets []. Then a variant type declaration looks like this in general:

type T= X₁[of t₁] | ... | X_n[of t_n]

Variant types introduce new syntax for terms e, patterns p, and values v:

e ::= ... | X(e) | match e with p₁-> e₁| ... | p_n-> e_n p ::= X | X(x₁:t₁..., x_n:t_n)
v ::= c | (v₁,...,v_n) | fun p-> e| X(v)

Note that the vertical bars in the expression "match e with p₁->e₁| ... | p_n->e_n" are part the syntax of this construct; the other vertical bars (|) are part of the BNF notation.

We can use variant types to define many useful data structures. In fact, the bool is really just a variant type with constructors named true and false.

Implementing integer lists

We can write our own version of lists using variant types. Suppose we want to define values that act like linked lists of integers. A linked list is either empty, or it has an integer followed by another list containing the rest of the list elements. This leads to a very natural variant type declaration:

type intlist = Nil | Cons of (int * intlist)

This type has two constructors, Nil and Cons. It is a recursive type because it mentions itself in its own definition (in the Cons constructor), just like a recursive function is one that mentions itself in its own definition.

Any list of integers can be represented by using this type. For example, the empty list is just the constructor Nil, and Cons corresponds to the operator ::. Here are some examples of lists:

let list1 = Nil 		(* the empty list:  []*)
and list2 = Cons(1,Nil) 	(* the list containing just 1:  [1] *)
and list3 = Cons(2,Cons(1,Nil)) (* the list [2;1] *)
and list4 = Cons(2,list2)       (* also the list [2;1] *)
(* the list [1;2;3;4;5] *)
and list5 = Cons(1,Cons(2,Cons(3,Cons(4,Cons(5,Nil)))))
(* the list [6;7;8;9;10] *)
and list6 = Cons(6,Cons(7,Cons(8,Cons(9,Cons(10,Nil)))))

So we can construct any lists we want. We can also take them apart using pattern matching. For example, our length function above can be written for our lists by just translating the list patterns into the corresponding patterns using constructors:

(* Returns the length of lst *)
let length(lst: intlist): int =
  match lst with
    Nil -> 0
  | Cons(h,t) -> 1 + length(t)

Similarly, we can implement many other functions over lists, as shown in the following examples.

type intlist = Nil | Cons of (int * intlist)

(* test to see if the list is empty *)
let is_empty(xs:intlist):bool = 
    match xs with
      Nil -> true
    | Cons(_,_) -> false

(* Return the number of elements in the list *)
let length(xs:intlist):int = 
    match xs with
      Nil -> 0
    | Cons(i:int,rest:intlist) -> 1 + length(rest)

(* Notice that the match expressions for lists all have the same
 * form -- a case for the empty list (Nil) and a case for a Cons.
 * Also notice that for most functions, the Cons case involves a
 * recursive function call. *)
(* Return the sum of the elements in the list *)
let rec sum(xs:intlist):int = 
    match xs with
      Nil -> 0
    | Cons(i:int,rest:intlist) -> i + sum(rest)

(* Create a string representation of a list *)
let rec toString(xs: intlist):string = 
    match xs with
      Nil -> ""
    | Cons(i:int, Nil) -> Int.toString(i)
    | Cons(i:int, Cons(j:int, rest:intlist)) -> 
       Int.toString(i) ^ "," ^ toString(Cons(j,rest))
    
(* Return the first element (if any) of the list *)
let head(is: intlist):int = 
    match is with
      Nil -> raise(Fail "empty list!")
    | Cons(i,tl) -> i

(* Return the rest of the list after the first element *)
let tail(is: intlist):intlist = 
    match is with
      Nil -> raise Fail("empty list!")
    | Cons(i,tl) -> tl

(* Return the last element of the list (if any) *)
let rec last(is: intlist):int = 
    match is with
      Nil -> raise Fail("empty list!")
    | Cons(i,Nil) -> i
    | Cons(i,tl) -> last(tl)

(* Return the ith element of the list *)
let rec nth (is: intlist) (i:int):int = 
    match (i,is) with
      (_,Nil) -> raise Fail("empty list!")
    | (1,Cons(i,tl)) -> i
    | (n,Cons(i,tl)) ->
	if (n <= 0) then raise Fail("bad index")
	else ith(tl, i - 1)

(* Append two lists:  append([1,2,3],[4,5,6]) = [1,2,3,4,5,6] *)
let rec append(list1:intlist, list2:intlist):intlist = 
    match list1 with
      Nil -> list2
    | Cons(i,tl) -> Cons(i,append(tl,list2))

(* Reverse a list:  reverse([1,2,3]) = [3,2,1].
 * Notice that we compute this by reversing the tail of the
 * list first (e.g., compute reverse([2,3]) = [3,2]) and then
 * append the singleton list [1] to the end to yield [3,2,1]. *)
let rec reverse(list:intlist):intlist = 
    match list with
      Nil -> Nil
    | Cons(hd,tl) -> append(reverse(tl), Cons(hd,Nil)) 

let inc(x:int):int = x + 1;;
let square(x:int):int = x * x;;

(* given [i1,i2,...,in] return [i1+1,i2+1,...,in+n] *)
let rec addone_to_all(list:intlist):intlist = 
    match list with
      Nil -> Nil
    | Cons(hd,tl) -> Cons(inc(hd), addone_to_all(tl))

(* given [i1,i2,...,in] return [i1*i1,i2*i2,...,in*in] *)
let rec square_all(list:intlist):intlist = 
    match list with
      Nil -> Nil
    | Cons(hd,tl) -> Cons(square(hd), square_all(tl))

(* given a function f and [i1,...,in], return [f(i1),...,f(in)].
 * Notice how we factored out the common parts of addone_to_all
 * and square_all. *)
let do_function_to_all(f:int->int, list:intlist):intlist = 
    match list with
      Nil -> Nil
    | Cons(hd,tl) -> Cons(f(hd), do_function_to_all(f,tl))

(* now we can define addone_to_all in terms of do_function_to_all *)
let addone_to_all(list:intlist):intlist = 
    do_function_to_all(inc, list);;

(* same with square_all *)
let square_all(list:intlist):intlist = 
    do_function_to_all(square, list);;

(* given [i1,i2,...,in] return i1+i2+...+in (also defined above) *)
let rec sum(list:intlist):int = 
    match list with
      Nil -> 0
    | Cons(hd,tl) -> hd + sum(tl)

(* given [i1,i2,...,in] return i1*i2*...*in *)
let rec product(list:intlist):int = 
    match list with
      Nil -> 1
    | Cons(hd,tl) -> hd * product(tl)

(* given f, b, and [i1,i2,...,in], return f(i1,f(i2,...,f(in,b))).
 * Again, we factored out the common parts of sum and product. *)
let collapse(f:(int * int) -> int, b:int, list:intlist):int = 
    match list with
      Nil -> b
    | Cons(hd,tl) -> f(hd,collapse(f,b,tl))

(* Now we can define sum and product in terms of collapse *)
let sum(list:intlist):int = 
    let add(i1:int,i2:int):int = i1 + i2
    in 
        collapse(add,0,list)
    end

let product(list:intlist):int = 
    let mul(i1:int,i2:int):int = i1 * i2
    in
        collapse(mul,1,list)
    end

(* Here, we use an anonymous function instead of declaring add and mul.
 * After all, what's the point of giving those functions names if all
 * we're going to do is pass them to collapse? *)
let sum(list:intlist):int = 
    collapse((function (i1:int,i2:int) -> i1+i2),0,list);

let product(list:intlist):int = 
    collapse((function (i1:int,i2:int) -> i1*i2),1,list);

(* And here, we just pass the operators directly... *)
let sum(list:intlist):int = collapse(op +, 0, list);

let product(list:intlist):int = collapse(op *, 1, list);

Representing trees with a recursive type

Trees are another very useful data structure, and unlike lists, they are not built into OCaml. A binary tree is a node containing a value and two children that are trees. A binary tree can also be an empty tree, which we also use to represent the absence of a child node. Just for variety, let's use a record type to represent a tree node. In OCaml we have to define two mutually recursive types, one to represent a tree node, and one to represent a (possibly empty) tree:

type inttree = Empty | Node of node
        and node = { value: int; left: inttree; right: inttree }

The rules on when mutually recursive type declarations are legal is a little tricky. Essentially, any cycle of recursive types must include at least one record or variant type. Since the cycle between inttree and node includes both kinds of types, this declaration is legal.

  2
 / \        Node {value=2; left=Node {value=1; left=Empty; right=Empty},
1   3                      right=Node {value=3; left=Empty; right=Empty}}

Because there are several things stored in a tree node, it's helpful to use a record rather than a tuple to keep them all straight. But a tuple would also have worked.

We can use pattern matching to write the usual algorithms for recursively traversing trees. For example, here is a recursive search over the tree:

(* Return true if the tree contains x. *)
let rec search(t: inttree, x:int): bool =
  match t with
    Empty -> false
  | Node {value=v; left=l; right=r} ->
	v = x || search(l, x) || search(r, x)

Of course, if we knew the tree obeyed the binary search tree invariant, we could have written a more efficient algorithm.

Representing natural numbers with a recursive type

We can even define data structures that act like numbers, demonstrating that we don't really have to have numbers built into OCaml either! A natural number is either the value zero or the successor of some other natural number. This definition leads naturally to the following definition for values that act like natural numbers nat:

type nat = Zero | Succ of nat

This is how you might define the natural numbers in a mathematical logic course. We have defined a new type nat, and Zero and Succ are constructors for values of this type. This type is different than the ones we saw in recitation: the definition of nat refers to nat itself. In other words, this is a recursive type. This allows us to build expressions that have an arbitrary number of nested Succ constructors. Such values act like natural numbers:

let zero = Zero
and one = Succ(Zero)
and two = Succ(Succ(Zero));;
let three = Succ(two);;
let four = Succ(three);;

When we ask the compiler what four represents, we get

- four;
it:nat = Succ (Succ (Succ (Succ Zero)))

Thus four is a nested data structure. The equivalent Java definitions would be

public interface nat { }
public class Zero implements nat { }
public class Succ implements nat { nat v; Succ(nat v) { v = this.v; } }

nat zero = new Zero();
nat one = new Succ(new Zero());
nat two = new Succ(new Succ(new Zero()));
nat three = new Succ(two);
nat four = new Succ(three);

And in fact the Java objects representing the various numbers are actually implemented similarly to the OCaml values representing the corresponding numbers.

Now we can write functions to manipulate values of this type.

fun iszero(n : nat) : bool = 
  match n with
    Zero -> true
  | Succ(m) -> false

The match expression allows us to do pattern matching on expressions. Here we're pattern-matching a value with type nat. If the value is Zero we evaluate to true; otherwise we evaluate to false.

fun pred(n : nat) : nat = 
  match n with
    Zero -> raise Fail "predecessor on zero"
  | Succ(m) -> m

Here we determine the predecessor of a number. If the value of n matches Zero then we raise an exception, since zero has no predecessor in the natural numbers. If the value matches Succ(m) for some value m (which of course also must be of type nat), then we return m.

Similarly we can define a function to add two numbers: (See if the students can come up with this with some coaching.)

fun add(n1:nat, n2:nat) : nat = 
  match n1 with
    Zero -> n2
  | Succ(n_minus_1) -> add(n_minus_1, Succ(n2))

If you were to try evaluating add(four,four), the compiler would respond with:

- add(four,four);
val it = Succ (Succ (Succ (Succ (Succ #)))) : nat

The compiler correctly performed the addition, but it has abbreviated the output because the data structure is nested so deeply. To easily understand the results of our computation, we would like to convert such values to type int:

let rec toInt(n:nat) : int = 
  match n with
    Zero -> 0
  | Succ(n) -> 1 + toInt(n)

That was pretty easy. Now we can write toInt(add(four,four)) and get 8. How about the inverse operation?

let rec toNat(i:int) : nat =
  if i < 0 then raise Fail "toNat on negative number"
  else if i = 0 then Zero
  else Succ(toNat(i-1))

To determine whether a natural number is even or odd, we can write a pair of mutually recursive functions:

let rec even(n:nat) : bool =
  match n with
    Zero -> true
  | Succ(n) -> odd(n)
and odd (n:nat) : bool =
  match n with
    Zero -> false
  | Succ(n) -> even(n)

You have to use the keyword and to combine mutually recursive functions like this. Otherwise the compiler would flag an error when you refer to odd before it has been defined.

Finally we can define multiplication in terms of addition. (See if the students can figure this out.)

let rec mul(n1:nat, n2:nat) : nat =
  match n1 with
    Zero -> Zero
  | Succ(n1MinusOne) -> add(n2, mul(n1MinusOne,n2))

Pattern matching

It turns out that the syntax of ML patterns is richer than what we saw in the last lecture. In addition to new kinds of terms for creating and projecting tuple and record values, and creating and examining variant type values, we also have the ability to match patterns against values to pull them apart into their parts.

When used properly, ML pattern matching leads to concise, clear code. This is because ML pattern matching allows one pattern to appear as a subexpression of another pattern. For example, we see above that Succ(n) is a pattern, but so is Succ(Succ(n)). This second pattern matches only on a value that has the form Succ(Succ(v)) for some value v (that is, the successor of the successor of something), and binds the variable n to that something, v.

Similarly, in our implementation of the nth function, earlier, a neat trick is to use pattern matching to do the if n=0 and the match at the same time. We pattern-match on the tuple (lst, n):

(* Returns the nth element with lst *)
let nth lst n =
  match (lst, n) with
    (h::t, 0) -> h
    (h::t, _) -> nth(t, n-1)
  | ([], _) -> raise(Fail "Can't get nth element of empty list")

Here, we've also added a clause to catch the empty list and raise an exception. We're also using the wildcard pattern _ to match on the n component of the tuple, because we don't need to bind the value of n to another variable—we already have n. We can make this code even shorter; can you see how?

Example: pattern matching on records

Natural numbers aren't quite as good as integers, but we can simulate integers in terms of the naturals by using a representation consisting of a sign and magnitude:

type sign = Pos | Neg
type integer = { sign : sign, mag : nat }

The type keyword simply defines a name for a type. Here we've defined integer to refer to a record type with two fields: sign and mag. Remember that records are unordered, so there is no concept of a "first" field.

The declarations of sign and integer both create new types. The type sign is distinct from any other variant type declared in the program, even if that other type has exactly the same constructor names and types. The type integer behaves similarly. However, it is possible to write type declarations that simply introduce a new name for an existing type. For example, if we wrote type number = int, then the types number and int could be used interchangeably when that declaration were in scope.

We can use the definition of integer to write some integers:

val zero   = {sign=Pos, mag=Zero}
val zero'  = {sign=Neg, mag=Zero}
val one    = {sign=Pos, mag=Succ(Zero)}
val negOne = {sign=Neg, mag=Succ(Zero)}

Now we can write a function to determine the successor of any integer:

  fun inc(i:integer) : integer =
    match i with
      {sign = _, mag = Zero} -> {sign = Pos, mag = Succ(Zero)}
    | {sign = Pos, mag = n} -> {sign = Pos, mag = Succ(n)}
    | {sign = Neg, mag = Succ(n)} -> {sign = Neg, mag = n}

Here we're pattern-matching on a record type. Notice that in the third pattern we are doing pattern matching because the mag field is matched against a pattern itself, Succ(n). Remember that the patterns are tested in order. How does the meaning of this function change if the first two patterns are swapped?

The predecessor function is very similar, and it should be obvious that we could write functions to add, subtract, and multiply integers in this representation.

OCaml syntax

Taking into account the ability to write complex patterns, we can now write down a more complete syntax for OCaml.

syntactic class	syntactic variables and grammar rule(s)	examples
identifiers	x, y	`a`, `x`, `y`, `x_y`, `foo1000`, ...
datatypes, datatype constructors	X, Y	`Nil`, `Cons`, `list`
constants	c	...`~2`, `~1`, `0`, `1`, `2` (integers) `1.0`, `~0.001`, `3.141` (floats) `true`, `false` (booleans) `"hello"`, `""`, `"!"` (strings) `#"A"`, `#" "` (characters)
unary operator	u	`~`, `not`, `size`, ...
binary operators	b	`+`, `*`, `-`, `>`, `<`, `>=`, `<=`, `^`, ...
expressions (terms)	e ::- c \| x \| u e \| e₁ b e₂ \| `if` e₁ `then` e₂ `else` e₃ \| `let` d₁...d_n `in` e `end` \| e `(`e₁`,` ...`,` e_n`)` `\| (`e₁`,`...`,`e_n`)` `\| #`n e \| `{`x₁`=`e₁`,` ...`,` x_n`=`e_n`}` `\| #`x e \| X`(`e`)` \| `match` e `with` p₁`->`e₁\| ... \| p_n`->`e_n	`~0.001`, `foo`, `not` `b`,`2 + 2`, `Cons(2, Nil)`
patterns	p ::= c \| x \| `(`p₁`,`...`,` p_n`)` \| `{`x₁`=` p₁`,`...`,`x_n`=` p_n`}` \| X \| X `(` p `)`	`a:int`, `(x:int,y:int), I(x:int)`
declarations	d ::= `val` p = e \| `fun` y p `:` t - e \| `datatype` Y - X₁ [`of` t₁] `\|` ... `\|` X_n[`of`t_n]	`val one = 1 fun square(x: int): int datatype d - N \| I of int`
types	t ::= `int` \| `float` \| `bool` \| `string` \| `char` \| t₁`->`t₂ \| t₁``...``t_n \| `{`x₁:t₁, x₂:t₂,..., x_n:t_n`}` \| Y	`int`, `string`, `int->int`, `bool*int->bool`
values	v ::= c \| `(`v₁`,`...`,`v_n`)` \| `{`x₁`=`v₁`,` ...`,` x_n`=`v_n`}` \| X`(`v`)`	`2`, `(2,"hello")`, `Cons(2,Nil)`

Note: pattern-matching floating point constants is not possible. So in the production "p ::= c | .." above, c is an integer, boolean, string, or character constant, but not float.

Polymorphism

Type systems are nice but they can get in your way. In a lot of programming languages (e.g., Java) we find that we end up rewriting the same code over and over again so that it works for different types. OCaml doesn't have this problem, but we need to introduce new features to show how to avoid it. Suppose we want to write a function that swaps the position of values in an ordered pair:

let swapInt(x: int, y: int): int*int = (y,x)
and swapReal(x: float, y: float): float*float = (y,x)
and swapString(x: string, y: string): string*string = (y,x)
...

This is tedious, because we're writing exactly the same algorithm each time. It gets worse! What if the two pair elements have different types?

fun swapIntReal(x: int, y: float): float*int = (y,x)
fun swapRealInt(x: float, y: int): int*float = (y,x)

And so on. There has to be a better way... and here it is:

# let swap ((x: 'a), (y: 'b)): 'b * 'a = (y,x):;
val swap : 'a * 'b -> 'b * 'a = <fun>

Instead of writing explicit types for x and y, we write type variables 'a and 'b. The type of swap is 'a*'b -> 'b*'a. What this means is that we can use swap as if it had any type that we could get by consistently replacing 'a and 'b in its type with a type for 'a and a type for 'b. We can use the new swap in place of all the old definitions:

swap(1,2);          (* swap : (int * int) -> (int * int) *)
swap(3.14,2.17);    (* swap : (float * float) -> (float * float) *)
swap("foo","bar");  (* swap : (string * string) -> (string * string) *)
swap("foo",3.14);   (* swap : (string * float) -> (float * string) *)

In fact, we can leave out the type declarations in the definition of swap, and OCaml will figure out the most general polymorphic type it can be given, automatically:

# let swap (x,y) = (y,x);;
val swap : 'a * 'b -> 'b * 'a = <fun>

The ability to use swap as though it had many different types is known as polymorphism, from the Greek for "many shapes". If we think of swap as having a "shape" that its type defines, then swap can have many shapes: it is polymorphic. Notice that the requirement that type variables be substituted consistently means that some types are ruled out; for example, it is impossible to use swap at the type (int*float) -> (string*int), because that type would consistently substitute for the type variable 'a but not for 'b.

ML programmers typically read the types 'a and 'b as "alpha" and "beta". This is easier than saying "single quotation mark a", and also they wish they could write Greek letters instead. In fact a type variable may be any identifier preceded by a single quotation mark; for example, 'key and 'value are also legal type variables. The ML compiler needs to have these identifiers preceded by a single quotation mark so that it knows it is seeing a type variable.

It's important to note that swap doesn't use its arguments x or y in any interesting way. It treats them as if they were black boxes. When the OCaml type checker is checking the definition of swap, all it knows is that x is of some arbitrary type 'a. It doesn't allow any operation to be performed on x that couldn't be performed on an arbitrary type. This means that the code is guaranteed to work for any x and y. If we want some operations to be performed on values whose types are type variables, we have to provide them as function values. For example,

- fun appendString(x: 'a, s: string, toString: 'a->string): string =
  (toString x) ^ " " ^ s
val appendString = fn : 'a * string * ('a -> string) -> string
- appendString(312, "class", Int.toString)
val it = "312 class" : string
- appendString("three", "twelve", fn(s:string) -> s)
val it = "three twelve" : string

Parameterized Types

The ability to write polymorphic code is pretty useless unless it comes with the ability to define data structures whose types depend on type variables. For example, last time we defined lists of integers as

type intList = Nil | Cons of  int * intList

But we'd like to be able to make lists of any kind of value, not just integers. (The built-in lists have this capability, of course). Further, using this definition of intList, we can write lots of functions for manipulating lists, yet many of these functions don't depend on what kind of values are stored in the list. The length function is a good example:

let rec length(lst: intList): int = 
    match lst with
      Nil -> 0
    | Cons(_, rest) -> 1 + length(rest)

We can avoid defining lots of list types and associated operations by declaring a parameterized variant type instead:

type 'a list_ = Nil | Cons of 'a * 'a list_

A parameterized datatype is a recipe for creating a family of related datatypes. The name 'a is a type parameter for which any other type may be supplied. For example, int list_ is a list of integers, float list_ is a list of float, and so on. However, list_ itself is not a type. Notice also that we cannot use list_ to create a list each of whose elements can be any type. All of the elements of a T list_ must be T's.

val il : int list_ = Cons(1,Cons(2,Cons(3,Nil)))    (* [1,2,3] *)

val rl : float list_ = Cons(3.14,Cons(2.17,Nil))     (* [3.14,2.17] *)

val sl : string list_ = Cons("foo",Cons("bar",Nil)) (* ["foo","bar"] *)

val srp : (string*int) list_ = 
    Cons(("foo",1),Cons(("bar",2),Nil))  (* [("foo",1), ("bar",2)] *)

val recp : {name:string, age:int} list_ = 
    Cons({name = "Greg", age = 150},
         Cons({name = "Amy", age = 3},
              Cons({name = "John", age = 1}, Nil)))

Notice list_ itself is not a type. We can think of list_ as a function that, when applied to a type like int, produces another type (int list_). A parameterized datatype is an example of a parameterized type constructor: a function that takes in parameters and gives back a type. Other languages have parameterized type constructors. For example, in Java you can declare a parameterized class:

class List<T> {
    T head;
    List <T> tail;
    ...
}

In OCaml, we can define polymorphic functions that know how to manipulate any kind of list:

(* is the list empty? *)
let isEmpty(lst: 'a list_): bool = 
    match lst with
        Nil -> true
      | _ -> false;

(* return the length of the list *)
let rec length(lst: 'a list_): int = 
    match lst with
      Nil -> 0
    | Cons(_, rest) -> 1 + (length rest)

(* append two lists:  append([a,b,c],[d,e,f]) = [a,b,c,d,e,f] *)
let rec append(x: 'a list_, y: 'a list_): 'a list_ = 
    match x with
        Nil -> y
      | Cons(h,t) -> Cons(h, append(t, y))

val il2 = append(il,il)
val il3 = append(il2,il2)
val il4 = append(il3,il3)

val sl2 = append(sl,sl)
val sl3 = append(sl2,sl2)

(* reverse the list:  reverse([a,b,c,d]) = [d,c,b,a] *)
fun reverse(x: 'a list_): 'a list_ = 
    match x with
        Nil -> Nil
      | Cons(h,t) -> append(reverse t, Cons(h,Nil));

val il5 = reverse(il4);
val sl4 = reverse(sl3);

(* apply the function f to each element of x:  
 *    map f [a,b,c] = [f(a),f(b),f(c)] *)
fun map (f: 'a->'b) (x: 'a list_): 'b list_ = 
    match x with
        Nil -> Nil
      | Cons(h,t) -> Cons(f h, map f t)

val sl5 = map Int.toString il5

(* insert sep between each element of x: 
 *    separate(s,[a,b,c,d]) = [a,s,b,s,c,s,d] *)
fun separate(sep: 'a, x: 'a list_) = 
    match x with
        Nil -> Nil
      | Cons(h,Nil) -> x
      | Cons(h,t) -> Cons(h, Cons(sep, separate(sep,t)))

(* prints out a list of elements as long as we can convert the
 * elements to a string using to_string. *)
fun printList (toString: 'a -> string) (x: 'a list_): unit = 
    let val strings = separate(",", map toString x)
    in 
        print("[");
        map print strings;
        print("]\n")
    end

fun printInts(x: int list_): unit = 
    printList Int.toString x

fun printReals(x: float list_): unit = 
    printList Real.toString x

fun printStrings(x: string list_): unit = 
    printList (fn s -> "\"" ^ s ^ "\"") x

Lists are useful, but they are hardly the only use for type parameterization. For example, we can define a datatype for binary trees using a tuples for the nodes:

type 'a tree = Leaf | Node of ('a tree) * 'a * ('a tree)

If we use a record type for the nodes, the record type also must be parameterized, and instantiated on the same element type as the tree type:

type 'a tree = Leaf | Node of 'a node
      and 'a node = {left: 'a tree; value: 'a; right: 'a tree}

It is also possible to have multiple type parameters on a parameterized type, in which case parentheses are needed:

type ('a,'b) pair = {first: 'a; second: 'b};;
let x = {first=2; second="hello"};;
val x: (int, string) pair = {first=2; second="hello"}

Abstract syntax and variant types

Earlier we noticed that there is a similarity between BNF declarations and variant type declarations. In fact, we can define variant types that act like the corresponding BNF declarations. The values of these variant types then represent legal expressions that can occur in the language. For example, consider a BNF definition of legal OCaml type expressions:

(base types)	b ::= `int` \| `float` \| `string` \| `bool` \| `char`
(types)	t ::= b \| t `->` t \| t₁`` t₂ ``...`` t_n* \| `{` x₁ `:` t₁`,` ...`,` x_n`:` t_n `}` \| X

This grammar has exactly the same structure as the following type declarations:

type id = string
type baseType = Int | Real | String | Bool | Char
type mlType = Base of baseType | Arrow of mlType*mlType | Product of mlType list
                 | Record of (id*mlType) list | DatatypeName of id

Any legal OCaml type expression can be represented by a value of type Type that contains all the information of the corresponding type expression. This value is known as the abstract syntax for that expression. It is abstract, because it doesn't contain any information about the actual symbols used to represent the expression in the program. For example, the abstract syntax for the expression int*bool->{name: string} would be:

Arrow( Product(Cons(Base Int, Cons(Base Bool, Nil))),
       Record(Cons(("name", Base String), Nil)))

The abstract syntax would be exactly the same even for a more verbose version of the same type expression: ((int*bool)->{name: string}). Compilers typically use abstract syntax internally to represent the program that they are compiling. We will see a lot more abstract syntax later in the course when we see how ML works.

CS 3110 Lecture 4 Variant types and polymorphism