Suppose we saw some code implementing a abstract data type whose representation was defined as follows:
type t = int * int
Now, suppose the repOK
function for this ADT looks like this:
fun repOK(x: t): t = let a:int = #1 x in let b:int = #2 x in x end end
Does this repOK
function accomplish anything? It tests whether
the #1
and #2
functions can be applied to the pair x
.
If x
is really of the type t
, it will certainly
succeed. In some languages (for example, Scheme or RCL), this kind of test might catch
incoming values that are not of the expected type. This is because Scheme is an
untyped language. In ML the test is completely useless. The code might as well
be
fun repOK(x: t): t = x
This is true because ML is a typed language with a sound type system. A language is typed if the compiler rejects some programs as not being well-formed, based on the expected types of values that appear to be used during computation. In a typed language, the compiler includes a type checker that determines whether the program is well-formed (also: well-typed).
The type system of a language is sound if the type checker gets it right: the expected type declared in the program (or determined by the type checker) always agrees with the actual type of the value that occurs when the program runs.
To put it another way, a language has a sound type system if in every program
that passes the type checker, operations like #1
, #2
are only ever asked to operate on tuple values, and similarly for the other
built-in operations.
We can describe this in terms of our models of evaluation. For example,
consider the substitution model that we have worked with in which evaluation is
described as a sequence of small evaluation steps e
-->
e'.
We said that an evaluation gets stuck if when the evaluation arrives at
an expression e, there is no
reduction that can be done yet the expression e
is not a value. The type system is sound if no
well-typed expression ever gets stuck.
In other words, whenever e is a well-typed program with type t, either
-->
e'
where e is also a well-typed program
(with type t)How does a type checker work for a language like ML? A large part of problem set 5 is to modify a type checker for a language like ML to support new language features, so this has important practical ramifications for you.
It's clear that we can figure out the types of constants. Suppose we have a
function tcheck
that takes in the AST for a program and returns its
type:
(* tcheck(e) is the type of e *)
fun tcheck(e: expr): type = ...
Then we know that
tcheck(2) = int tcheck(#"x") = char tcheck("foo") = string
You will notice that the right-hand sides look like they name types. They're really datatype constructors:
datatype type = unit | bool | char | int | string | Id | * of type*type | -> of type*type
So we can write expressions like int->int
and we actually
mean a value ->(int,int)
that represents the type int->int
.
Actually we can't name a datatype constructor "->
" in
SML. We'll have to give it a different name, e.g.
datatype type = Unit | Bool | Char | Int | String | Id
| Product of type*type | Arrow of type*type
However, to keep our code compact and readable, we'll pretend that we can
overload ->
and *
in the way shown above.
Clearly we can write code to return the types of constants. But what about
programs in general? The program is represented by an abstract syntax tree; as
you know, it is convenient to express computations over trees recursively.
Therefore we want to be able to type-check a program e
-- that is,
implement tcheck(e)
-- by recursively applying tcheck
to all of the subexpressions of e
.
Consider this simple program:
let x: int = 2 in x end
If tcheck
is implemented recursively, it has to be applied to the
subexpression x
(actual rep: Var("x")
). But
there's no way to figure out what tcheck(x)
should return because
we don't have any information about the expected types of unbound variables like
x. The tcheck
functions need more information: an environment
that maps variable names to their expected types. We will call this environment
a type environment; it's also known as a typing context.
In this case the type environment contains a single mapping {x : int}
. In
general a type environment looks just like a record type; it has a bunch of
names and associated types. Type environments are
really an abstraction supporting the following operations:
type environment empty_env: environment lookup_var: environment*string -> type add_var: environment*string*type -> environment
Now we augment tcheck
to take in a type environment:
(* tcheck(env, e) is the type of e in the type environment env *) val tcheck: environment * expr -> type
And we can define the obvious rule for type-checking an unbound variable:
tcheck(env, id) = lookup_var(env,id)
The rule for typing a simple let extends the current environment:
tcheck(env, let x:t = e in e' end) = tcheck(add_var(env,x,t), e') when tcheck(env',e) = t
Now we're ready to define more general rules for type-checking ML. We accomplish this by defining various functions:
tcheck
takes an environment and an
expression and returns a type.declcheck
takes an environment and
declaration and returns a new environment that includes whatever new
bindings the declaration introduces.patenv
takes a pattern and its expected type
and and returns a new environment that contains just the bindings introduced
by the pattern when it matches.patcheck
takes an environment, type, and
pattern and returns an extended environment that includes whatever bindings
the pattern introduces when it matches. It is defined simply in terms of patenv
.Notice that when e1 and e2 are ints, e1+e2
has type int, but when e1 and e2 are reals,
then e1+e2 has type real. To type-check
binary and unary operators, we assume there are functions that give us the
argument and result types for these operations. Similarly, we assume there
are some global functions, such as constructor_result_type
and constructor_arg_type
that provide information about the types of data constructors and their
arguments. Note that the result type of a binary operation (such as +) may
depend upon the types of its arguments.
tcheck(env, n) = int (where n is an integer constant) tcheck(env, r) = real (where r is a real constant) tcheck(env, s) = string (when s is a string constant) tcheck(env, c) = char (when c is a char constant)
tcheck(env, id) = lookup_var(env,id)
tcheck(env, fn (id:t) => e) = t -> tcheck(add_var(env,id,t),e)
tcheck(env, e1(e2)) = t2 when tcheck(env, e1) = t1 -> t2 and tcheck(env, e2) = t3 and same_types(t1,t3)
tcheck(env, u e) = unary_op_result_type(t1,u) when tcheck(env, e) = t1 and unary_op_arg_type(t1,u) = t2 and same_types(t1,t2)
tcheck(env, e1 bop e2) = binary_op_result_type(bop,t1) when tcheck(env, e1) = t1 and tcheck(env, e2) = t2 and binary_op_arg_type(b,t1) = t3 and same_types((t1 * t2), t3)
tcheck(env, (e1,e2,...,en)) = (t1 * t2 * ... * tn) when tcheck(env, ei) = ti (for 1 <= i <= n)
tcheck(env, #i e) = ti when tcheck(env, e) = (t1 * t2 * ... * tn) and (1 <= i <= n)
tcheck(env, {lab1=e1,lab2=e2,...,labn=en}) = sort_labels({lab1:t1,...,labn:tn}) when tcheck(env, ei) = ti and lab1,...,labn are distinct
tcheck(env, #lab e) = ti when tcheck(env,e) = {lab1:t1,...,labi:ti,...,labn:tn} and same_labels(lab,labi)
tcheck(env, Id) = nullary_constructor_result_type(Id)
tcheck(env, Id(e)) = constructor_result_type(Id) when tcheck(env, e) = t and same_types(t, constructor_arg_type(Id))
tcheck(env, case e of p1 => e1 | p2 => e2 | ... | pn => en) = t1 when tcheck(env, e) = t and patcheck(env,t,p1) = env1 and tcheck(env1,e1) = t1 and patcheck(env,t,p2) = env2 and tcheck(env2,e2) = t2 ... and patcheck(env,t,pn) = envn and tcheck(envn,en) = tn and same_types([t1,t2,...,tn])
(and p1,...,pn are exhaustive for t and do not overlap.)
tcheck(env, let d in e end) = t when declcheck(env,d) = env' and tcheck(env',e) = t
declcheck(env, val p = e) = env' when tcheck(env,e) = t and patcheck(env,t,p) = env' (and p is exhaustive for t)
declcheck(env, fun id1(id2:t1):t2 = e) = env'' when env' = add_var(id2,t1,add_var(id1,t1->t2,env)) and env'' = add_var(id1,t1->t2,env) and tcheck(env', e) = t3 and same_type(t2,t3)
patcheck(env, t, p) = union_env(env,env') when vars(p) are unique and patenv(t, p) = env'
patenv(t, _) = empty_env
patenv(t, id) = add_var(id,t,empty_env)
patenv(t, c) = empty_env when tcheck(empty_env,c) = t2 and same_type(t,t2)
patenv((t1 * ... * tn), (p1,...,pn)) = union_envs(env1,...,envn) when patenv(ti,pi) = envi
patenv({lab1:t1,...,labn:tn}, {lab1=p1,...,labn=pn}) = union_envs(env1,...,envn) when patenv(ti,pi) = envi
patenv(t, Id) = empty_env when nullary_constructor_result_type(Id) = t2 and same_type(t,t2)
patenv(t, Id(p)) = env when constructor_result_type(Id) = t2 and same_type(t,t2) and constructor_arg_type(Id) = t3 and patenv(t3,p) = env
This simple treatment does not deal with type inference, but it turns out to be relatively straightforward to do so. In addition, this treatment does not deal with polymorphism, but again, this is relatively simple to add to the type checker. Finally, we aren't specifying how to check that a list of patterns is exhaustive and does not overlap. This turns out to be fairly tricky to implement.