Type systems are nice but they can get in your way. In a lot of programming languages (e.g., Java) we find that we end up rewriting the same code over and over again so that it works for different types. SML doesn't have this problem, but we need to introduce new features to show how to avoid it. Suppose we want to write a function that swaps the position of values in an ordered pair:
fun swapInt(x:int,y:int):(int*int) = (y,x) fun swapReal(x:real,y:real):(real*real) = (y,x) fun swapString(x:string,y:string):(string*string) = (y,x)This is tedious, because we're writing exactly the same algorithm each time. It gets worse! What if the two pair elements have different types?
fun swapIntReal(x:int,y:real):(real*int) = (y,x) fun swapRealInt(x:real,y:int):(int*real) = (y,x)And so on. There has to be a better way... and here it is:
- fun swap(x:'a, y:'b):('b * 'a) = (y,x)
val swap = fn : 'a * 'b -> 'b * 'a
Instead of writing explicit types for x
and y
, we write
type variables 'a
and 'b
. The type of
swap
is 'a*'b
->
'b*'a
. What
this means is that we can use swap as if it had any type that we could get by
consistently replacing 'a
and 'b
in its type with a
type for 'a
and a type for 'b
. We can use the new swap
in place of all the old definitions:
swap(1,2); (* swap : (int * int) -> (int * int) *) swap(3.14,2.17); (* swap : (real * real) -> (real * real) *) swap("foo","bar"); (* swap : (string * string) -> (string * string) *) swap("foo",3.14); (* swap : (string * real) -> (real * string) *)
This ability to use swap as though it had many different types is known as polymorphism,
from the Greek for "many shapes". If we think of swap as having a
"shape" that its type defines, then swap can have many shapes: it is polymorphic.
Notice that the requirement that type variables be substituted consistently
means that some types are ruled out; for example, it is impossible to use swap
at the type (int*real) -> (string*int)
, because that type would
consistently substitute for the type variable 'a
but not for 'b
.
ML programmers typically read the types 'a
and 'b
as "alpha" and
"beta". This is easier than saying "single quotation mark
a", and also they wish they could write Greek letters instead. In fact a
type variable may be any identifier preceded by a single quotation mark; for
example, 'key
and 'value
are also legal type
variables. The ML compiler needs to have these identifiers preceded by a single
quotation mark so that it knows it is seeing a type variable.
It's important to note that swap
doesn't use its arguments x
or
y
in any
interesting way. It treats them as if they were black boxes. When the SML type
checker is checking the definition of swap, all it knows is that x
is of some
arbitrary type 'a
. It doesn't allow any operation to be performed on
x
that
couldn't be performed on an arbitrary type. This means that the code is
guaranteed to work for any x
and y
. If we want some operations to be performed
on values whose types are type variables, we have to provide them as function
values. For example,
- fun appendString(x: 'a, s:string, toString: 'a->string): string =
toString(x) ^ " " + s
val appendString = fn : 'a * string * ('a -> string) -> string
- appendString(312, "class", Int.toString)
val it = "312 class" : string
- appendString("three", "twelve", fn s:string => s)
val it = "three twelve" : string
The ability to write polymorphic code is pretty useless unless it comes with the ability to define data structures whose types depend on type variables. For example, last time we defined lists of integers as
datatype intlist = Nil | Cons of (int * intlist)But we'd like to be able to make lists of any kind of value, not just integers. (The built-in lists have this capability, of course). Further, using this definition of
intlist
, we can write lots of functions for manipulating
lists, yet many of these functions don't depend on what kind of values are stored in the list. The length
function is a good example:
fun length(lst:intlist):int = case lst of Nil => 0 | Cons(_,rest:intlist) => 1 + length(rest)We can avoid defining lots of list datatypes and associated operations by declaring a parameterized datatype instead:
datatype 'a List = Nil | Cons of 'a * ('a List)
A parameterized datatype is a recipe for creating a family of related
datatypes. The type variable 'a
is a type parameter for which any
other type may be supplied. For example, int List
is a list of
integers, real List
is a list of reals, and so on. However, List
itself is not a type. Notice also that we cannot use List
to create
a list each of whose elements can be any type. All of the elements of a T
List
must be T
's.
val il : int List = Cons(1,Cons(2,Cons(3,Nil))); (* [1,2,3] *) val rl : real List = Cons(3.14,Cons(2.17,Nil)); (* [3.14,2.17] *) val sl : string List = Cons("foo",Cons("bar",Nil)) (* ["foo","bar"] *) val srp : (string*int) List = Cons(("foo",1),Cons(("bar",2),Nil)); (* [("foo",1), ("bar",2)] *) val recp : {name:string, age:int} List = Cons({name = "Greg", age = 150}, Cons({name = "Amy", age = 3}, Cons({name = "John", age = 1}, Nil)))Notice
List
itself is not a type. We can think of List
as a function that, when applied to a
type like int
, produces another type (int
List
).
We can also define polymorphic functions that know how to manipulate any kind of lists:
(* is the list empty? *) fun isEmpty(lst:'a List):bool = case lst of Nil => true | _ => false;
(* return the length of the list *) fun length(lst:'a List):int = case lst of Nil => 0 | Cons(_,rest:'a List) => 1 + length(rest) (* append two lists: append([a,b,c],[d,e,f]) = [a,b,c,d,e,f] *) fun append(x:'a List, y:'a List):'a List = case x of Nil => y | Cons(hd:'a, tl:'a List) => Cons(hd, append(tl, y)) val il2 = append(il,il) val il3 = append(il2,il2) val il4 = append(il3,il3) val sl2 = append(sl,sl) val sl3 = append(sl2,sl2) (* reverse the list: reverse([a,b,c,d]) = [d,c,b,a] *) fun reverse(x:'a List):'a List = case x of Nil => Nil | Cons(hd,tl) => append(reverse(tl), Cons(hd,Nil)); val il5 = reverse(il4); val sl4 = reverse(sl3); (* apply the function f to each element of x: * map f [a,b,c] = [f(a),f(b),f(c)] *) fun map (f:'a->'b) (x:'a List):'b List = case x of Nil => Nil | Cons(hd,tl) => Cons(f hd, map f tl) val sl5 = map Int.toString il5 (* insert sep between each element of x: * separate(s,[a,b,c,d]) = [a,s,b,s,c,s,d] *) fun separate(sep:'a, x:'a List) = case x of Nil => Nil | Cons(hd,Nil) => x | Cons(hd,tl) => Cons(hd,Cons(sep,separate(sep,tl))) (* prints out a list of elements as long as we can convert the * elements to a string using to_string. *) fun printList (toString:'a -> string) (x:'a List):unit = let val strings = separate(",", map toString x) in print("["); map print strings; print("]\n") end fun printInts(x:int List):unit = printList Int.toString x fun printReals(x:real List):unit = printList Real.toString x fun printStrings(x:string List):unit = printList (fn s => "\"" ^ s ^ "\"") xLists are useful, but they are hardly the only use for type parameterization. For example, we can define a datatype for binary trees:
datatype 'a Tree = Leaf | Node of ('a Tree) * 'a * ('a Tree)
Earlier we noticed that there is a similarity between BNF declarations and datatype declarations. In fact, we can define datatype declarations that act like the corresponding BNF declarations. The values of these datatypes then represent legal expressions that can occur in the language. For example, consider a BNF definition of legal SML type expressions:
(base types) | b ::= int | real | string
| bool | char
|
(types) | t ::= b | t ->
t | t1 * t2
* ...* tn
| { x1 : t1,
...,
xn: tn
} | X
|
This grammar has exactly the same structure as the following datatype declarations:
type id = string datatype baseType = Int | Real | String | Bool | Char datatype mlType = Base of baseType | Arrow of mlType*mlType | Product of mlType List | Record of (id*mlType) List | DatatypeName of id
Any legal SML type expression can be represented by a value of type Type
that contains all the information of the corresponding type expression. This value is known as
the abstract syntax for that expression. It is abstract, because it
doesn't contain any information about the actual symbols used to represent the
expression in the program. For example, the abstract syntax for the expression int*bool->{name:
string}
would be:
Arrow( Product(Cons(Base Int, Cons(Base Bool, Nil))), Record(Cons(("name", Base String), Nil)))
The abstract syntax would be exactly the same even for a more verbose version
of the same type expression: ((int*bool)->{name:
string})
. Compilers typically use abstract syntax internally to represent the program
that they are compiling. We will see a lot more abstract syntax later in the
course when we see how ML works.