CS312 Lecture 21
Type Checking

Suppose we saw some code implementing a abstract data type whose representation was defined as follows:

type t = int * int

Now, suppose the repOK function for this ADT looks like this:

fun repOK(x: t): t =
  let a:int = #1 x
  in
    let b:int = #2 x
    in
       x
    end
  end

Does this repOK function accomplish anything? It tests whether the #1 and #2 functions can be applied to the pair x. If x is really of the type t, it will certainly succeed. In some languages (for example, Scheme or RCL), this kind of test might catch incoming values that are not of the expected type. This is because Scheme is an untyped language. In ML the test is completely useless. The code might as well be

 fun repOK(x: t): t = x

This is true because ML is a typed language with a sound type system. A language is typed if the compiler rejects some programs as not being well-formed, based on the expected types of values that appear to be used during computation. In a typed language, the compiler includes a type checker that determines whether the program is well-formed (also: well-typed).

Soundness

The type system of a language is sound if the type checker gets it right: the expected type declared in the program (or determined by the type checker) always agrees with the actual type of the value that occurs when the program runs.

To put it another way, a language has a sound type system if in every program that passes the type checker, operations like #1, #2 are only ever asked to operate on tuple values, and similarly for the other built-in operations.

We can describe this in terms of our models of evaluation. For example, consider the substitution model that we have worked with in which evaluation is described as a sequence of small evaluation steps e --> e'. We said that an evaluation gets stuck if when the evaluation arrives at an expression e, there is no reduction that can be done yet the expression e is not a value. The type system is sound if no well-typed expression ever gets stuck.

In other words, whenever e is a well-typed program with type t, either

e is a value (the program is just a constant that does not need evaluation), or
e --> e' where e is also a well-typed program (with type t)

Designing a type checker

How does a type checker work for a language like ML? A large part of problem set 5 is to modify a type checker for a language like ML to support new language features, so this has important practical ramifications for you.

It's clear that we can figure out the types of constants. Suppose we have a function tcheck that takes in the AST for a program and returns its type:

(* tcheck(e) is the type of e *)
fun tcheck(e: expr): type = ...

Then we know that

tcheck(2) = int
tcheck(#"x") = char
tcheck("foo") = string

You will notice that the right-hand sides look like they name types. They're really datatype constructors:

datatype type = unit | bool | char | int | string | Id | * of type*type | -> of type*type

So we can write expressions like int->int and we actually mean a value ->(int,int) that represents the type int->int. Actually we can't name a datatype constructor "->" in SML. We'll have to give it a different name, e.g.

datatype type = Unit | Bool | Char | Int | String | Id
                     | Product of type*type | Arrow of type*type

However, to keep our code compact and readable, we'll pretend that we can overload -> and * in the way shown above.

Clearly we can write code to return the types of constants. But what about programs in general? The program is represented by an abstract syntax tree; as you know, it is convenient to express computations over trees recursively. Therefore we want to be able to type-check a program e -- that is, implement tcheck(e) -- by recursively applying tcheck to all of the subexpressions of e.

Consider this simple program:

let x: int = 2
in
  x
end

If tcheck is implemented recursively, it has to be applied to the subexpression x (actual rep: Var("x")). But there's no way to figure out what tcheck(x) should return because we don't have any information about the expected types of unbound variables like x. The tcheck functions need more information: an environment that maps variable names to their expected types. We will call this environment a type environment; it's also known as a typing context.

In this case the type environment contains a single mapping {x : int}. In general a type environment looks just like a record type; it has a bunch of names and associated types. Type environments are really an abstraction supporting the following operations:

type environment
empty_env: environment
lookup_var: environment*string -> type
add_var: environment*string*type -> environment

Now we augment tcheck to take in a type environment:

(* tcheck(env, e) is the type of e in the type environment env *)
val tcheck: environment * expr -> type

And we can define the obvious rule for type-checking an unbound variable:

tcheck(env, id) = lookup_var(env,id)

The rule for typing a simple let extends the current environment:

tcheck(env, let x:t = e in e' end) = tcheck(add_var(env,x,t), e') 
  when tcheck(env',e) = t

Type-checking rules

Now we're ready to define more general rules for type-checking ML. We accomplish this by defining various functions:

tcheck takes an environment and an expression and returns a type.
declcheck takes an environment and declaration and returns a new environment that includes whatever new bindings the declaration introduces.
patenv takes a pattern and its expected type and and returns a new environment that contains just the bindings introduced by the pattern when it matches.
patcheck takes an environment, type, and pattern and returns an extended environment that includes whatever bindings the pattern introduces when it matches. It is defined simply in terms of patenv.

Notice that when e₁ and e₂ are ints, e₁+e₂ has type int, but when e₁ and e₂ are reals, then e₁+e₂ has type real. To type-check binary and unary operators, we assume there are functions that give us the argument and result types for these operations. Similarly, we assume there are some global functions, such as constructor_result_type and constructor_arg_type that provide information about the types of data constructors and their arguments. Note that the result type of a binary operation (such as +) may depend upon the types of its arguments.

Constants:

tcheck(env, n) = int (where n is an integer constant)
tcheck(env, r) = real (where r is a real constant)
tcheck(env, s) = string (when s is a string constant)
tcheck(env, c) = char (when c is a char constant)

Variables:

tcheck(env, id) = lookup_var(env,id)

Anonymous Functions:

tcheck(env, fn (id:t) => e) = t -> tcheck(add_var(env,id,t),e)

Function Applications:

tcheck(env, e₁(e₂)) = t₂
  when tcheck(env, e₁) = t₁ -> t₂
   and tcheck(env, e₂) = t₃
   and same_types(t₁,t₃)

Unary Operations:

tcheck(env, u e) = unary_op_result_type(t₁,u)
  when tcheck(env, e) = t₁
   and unary_op_arg_type(t₁,u) = t₂
   and same_types(t₁,t₂)

Binary Operations:

tcheck(env, e₁ bop e₂) = binary_op_result_type(bop,t₁)
  when tcheck(env, e₁) = t₁
   and tcheck(env, e₂) = t₂
   and binary_op_arg_type(b,t₁) = t₃
   and same_types((t₁ * t₂), t₃)

Tuples:

tcheck(env, (e₁,e₂,...,e_n)) = (t₁ * t₂ * ... * t_n)
  when tcheck(env, e_i) = t_i (for 1 <= i <= n)

Tuple Projections:

tcheck(env, #i e) = t_i
  when tcheck(env, e) = (t₁ * t₂ * ... * t_n)
   and (1 <= i <= n)

Records:

tcheck(env, {lab₁=e₁,lab₂=e₂,...,labn=e_n}) = sort_labels({lab₁:t₁,...,labn:t_n})
  when tcheck(env, e_i) = t_i
   and lab₁,...,lab_n are distinct

Record Projections:

tcheck(env, #lab e) = t_i
  when tcheck(env,e) = {lab₁:t₁,...,lab_i:t_i,...,lab_n:t_n}
   and same_labels(lab,lab_i)

Simple Datatype Constructors:

tcheck(env, Id) = nullary_constructor_result_type(Id)

Value-Carrying Datatype Constructors:

tcheck(env, Id(e)) = constructor_result_type(Id)
  when tcheck(env, e) = t
   and same_types(t, constructor_arg_type(Id))

Case Expressions:

tcheck(env, case e of p1 => e₁ | p₂ => e₂ | ... | p_n => e_n) = t₁
  when tcheck(env, e) = t
   and patcheck(env,t,p₁) = env₁
   and tcheck(env₁,e₁) = t₁
   and patcheck(env,t,p₂) = env₂
   and tcheck(env₂,e₂) = t₂
     ...
   and patcheck(env,t,p_n) = env_n
   and tcheck(env_n,e_n) = t_n
   and same_types([t₁,t₂,...,t_n])

   (and p₁,...,p_n are exhaustive for t and do not overlap.)

Let Expressions:

tcheck(env, let d in e end) = t
  when declcheck(env,d) = env'
   and tcheck(env',e) = t

Val Declarations:

declcheck(env, val p = e) = env'
  when tcheck(env,e) = t
   and patcheck(env,t,p) = env'
   (and p is exhaustive for t)

Fun Declarations:

declcheck(env, fun id₁(id₂:t₁):t₂ = e) = env''
  when env' = add_var(id₂,t₁,add_var(id₁,t₁->t₂,env))
   and env'' = add_var(id₁,t₁->t₂,env)
   and tcheck(env', e) = t₃
   and same_type(t₂,t₃)

Pattern Checking:

patcheck(env, t, p) = union_env(env,env')
  when vars(p) are unique
   and patenv(t, p) = env'

Wildcards:

patenv(t, _) = empty_env

Variable Patterns:

patenv(t, id) = add_var(id,t,empty_env)

Constant Patterns:

patenv(t, c) = empty_env
  when tcheck(empty_env,c) = t₂
   and same_type(t,t₂)

Tuple Patterns:

patenv((t₁ * ... * t_n), (p₁,...,p_n)) = union_envs(env₁,...,env_n)
  when patenv(t_i,p_i) = env_i

Record Patterns:

patenv({lab₁:t₁,...,lab_n:t_n}, {lab₁=p₁,...,lab_n=p_n}) = union_envs(env₁,...,env_n)
  when patenv(t_i_,p_i) = env_i

Simple Constructor Patterns:

patenv(t, Id) = empty_env
  when nullary_constructor_result_type(Id) = t₂
   and same_type(t,t₂)

Value-Carrying Constructor Patterns:

patenv(t, Id(p)) = env
  when constructor_result_type(Id) = t₂
   and same_type(t,t₂)
   and constructor_arg_type(Id) = t₃
   and patenv(t₃,p) = env

This simple treatment does not deal with type inference, but it turns out to be relatively straightforward to do so. In addition, this treatment does not deal with polymorphism, but again, this is relatively simple to add to the type checker. Finally, we aren't specifying how to check that a list of patterns is exhaustive and does not overlap. This turns out to be fairly tricky to implement.

CS312 Lecture 21 Type Checking