Skip to main content


Type Inference

Type systems are very useful for exposing programmer mistakes. However, writing down type annotations may become a tedious burden, especially as type systems become more expressive. Type systems generally don't require programmers to annotate every term with a type, but some languages go further, inferring many more type annotations.

We've already seen an example in the Xi language where type inference could come in handy: the typing of array literals. The obvious typing rule for array literals requires that every element has the element type:

Γei:t  (i1..n) n0
Γ{e1,,en}:t[] (ArrayLit)

As long as n>0, this rule is syntax-directed and no type annotations are needed to determine the type of the array. But what the expression {}, where n=0? In this case, the type checker doesn't know what type t to choose. Yet we'd like to be able to type-check statements like the following:

a: int[][] = {}

Another classic place where type inference shows up is in statically typed functional programming languages. For example, in OCaml, we can write an expression like fun x → x + 1 and the compiler will figure out, even without a type declaration for the argument x, that the expression has type int → int. If we look at the typing rule for function expressions (aka lambda expressions), we see that it is also not syntax-directed:

Γ,x:te:t
Γfun xe:tt (Lambda)

The rule does not say how to generate the type t of the formal parameter x, yet the ordinary implementation of this rule would be to type-check the body e in a context in which x is bound to type t.

In both cases, type inference can be used to figure out the missing types automatically. The idea is to introduce type variables that will be solved for as part of the type-checking process. We represent these type variables by introducing a new metavariable syntax for types appearing in typing rules: T. The type checker proceeds as if type variables Ti represented types, but along the way, collects constraints on these type variables that must be solved for the typing derivation to be valid. So that all the constraints can be extracted at the end without creating name conflicts between the constraints, each rule that introduces a type variable generates a fresh name that is used nowhere else in the typing derivation.

For example, we can write the Xi array literal rule to introduce such a type variable:

Γei:ti  (i1..n) n0 ti=T  (i1..n) T fresh
Γ{e1,,en}:T[] (ArrayLit)

The third premise in blue is a set of constraints requiring that the element of the array be equal to all of the individual element types. If n=0, there will be no such constraints, so the type T is free to be chosen based on how the array literal is used, as we would like.

Similarly, we can write the rule for typing a variable declaration in a way that is amenable to type inference. Again the constraint premise left to be solved out of band is shown in blue.

Γe:t t=t
Γx:t=e:1Γ,x:t(VarInit)

The rules for checking functions and their uses can written in the same way:

Γ,x:Te:t T fresh
Γfun xe:tt (Lambda)
Γe1:t1 Γe2:t2 T3 fresh t1=t2T3
Γe1(e2):T3 (App)

To see how these rules work, consider type-checking the example above. The rules are completely syntax-directed:

Γ{}:T[] (ArrLit) int[][]=T[]
a: int[][] = {}a:int[][] (VarInit)

After constructing the proof tree, we are left with the constraint int[]=T[], which is easily solved by setting T=int[]. Substituting int[] for T, we arrive at a valid derivation in the original type system, while inferring the right element for the array type.

Unification

In general, the set of type constraints will not be so easy to solve. Solving type equations can be posed as a problem of unification: given a set of type equations {t1=t1,t2=t2,,tn=tn}, the goal is to find a maximally weak substitution S such that S(ti)=S(ti) for all i.

Here, a substitution is a function mapping variables to types: {T1t1,T2t2,}, and its action when applied to a type expression is to replace all occurrences of the type variables it maps. We say that a substitution S1 is weaker than another subsitution S2 (or more precisely, no stronger than it) if there exists some substitution S3 such that S2=S3S1. If both S1 and S2 suffice to solve the equations, S3 represents an unnecessary substitution. For example, the equation T1[]=T2[] is solved by both the substution S1={T1T2} and the substitution S2={T1int,T2int}. In this case, S1 is a weaker substitution, as witnessed by the substitution S3={T2int}, which is not necessary to satisfy the equation.

Robinson's algorithm

Robinson's algorithm (1965) find a weakest substitution unifying a set of equations {t1=t1,t2=t2,,tn=tn}. Let E be a set of equations. The result of unification is defined recursively:

unify()=unify(B=B,E)=unify(E)where B is some base type such as intunify(T=T,E)=unify(E)where T is a type variableunify(t1t2=t1t2,E)=unify(t1=t1,t2=t2,E)unify(t[]=t[],E)=unify(t=t,E)unify(T=t,E)=unify(t=T,E)=unify(S(E))Swhere S={Tt}. Solves for T.

Any unification problem that does not match one of these cases is a failure: there is no way to solve the equations.

It is not immediately obvious that this recursive definition is well founded. To see that it describes a terminating computation, observe that in each equation, either the total number of type variables decreases or stays the same, or else the set of equations becomes syntactically smaller. The last rule, which solves for and eliminates a variable, makes the real progress; the other rules simplify equations in a way that exposes type variables so the last rule can be applied.

Implementing unification

A simple and efficient way to implement unification is to use imperative update, which is not only simple but can also lead to an asymptotically more compact representation of the types that are solved for.

We represent each type variable as a mutable cell that is initially empty but can be made to contain a pointer to a type expression. When a type variable is solved for using the last rule above, its cell is updated to point to the corresponding type expression. In general, the type expression can be another type variable, in which case a pointer is created from one box to the next. A chain of such pointers may be created; when such a chain is traversed, path compression should be applied to make all the variables along the path point to the last expression in the path. With path compression, unification takes near-linear time even when the inferred types take exponential space when fully expanded!

For example, consider the singleton set of type equations E={T1T1[]=int*T2T3}. Pictorially, we can represent the progress of the unification algorithm as shown below. The dotted green lines represent equations. As each equation is processed, the algorithm either adds new equations or solves a type variable.

Unifying E={T1T1[]=int*T2T3}.

The full solution can be read out by following the pointers at the end of the algorithm: T1=intT2 and T3=T1[]=(intT2)[]. The algorithm does not subsitute for T2 because that is unnecessary.

The Hindley–Milner algorithm

The algorithm above does not handle the style of parametric polymorphism present in functional languages like OCaml (called let-polymorphism). In these languages, we can define polymorphic values that are instantiated at their uses. For example, OCaml allows us to define a polymorphic identity function and apply it to multiple types:

let id x = x in
    id 42;
    id true

The algorithm interleaves unification with type checking to find polymorphic types. After solving the known constraints after the declaration of variable id, the type of id is known only to have form TT, where the variable T appears in no other types in the context. Since T is completely unconstrained, The algorithm concludes that it can give id the type schema αα, where α is a type variable that can be instantiated arbitrarily. At the two uses of id, the type schema is instantiated with distinct fresh type variables—call them T1 and T2—for the uses. These type variables are then solved as T1=int and T2=bool, so that id has the type intint at the first use and type boolbool at the second.

In practice, the Hindley–Milner algorithm takes near-linear time.