Lecture 24:
An ML Interpreter: Putting Everything Together

The time has now come for us to put together a lot of what we have learned about SML in general, and the environment model in particular. We will test - and expand - our knowledge by creating and modifying an interpreter for a subset of SML. To distinguish between "true" SML and the implemented language subset, we will call the latter mini-ML.

The complete code for the interpreter is available here; download and unzip the archive, then make sure to open and edit the sources.cm file to adjust the path to the SML libraries. You can access the zipped handouts for the original interpreter here. The handouts are in Postscript; if you can not visualize or print Postscript documents, you can download and install Ghostscript, Ghostview and GSview.

You can understand the discussion below much better if you try to solve the list of interpreter problems we propose in lecture 25. We have collected these - and some non-interpreter related problems - into our list of suggested problems. We have provided solutions to selected problems. An updated version of the interpreter is available - it includes some of the code discussed in our solutions.

Mini-ML

Mini-ML's in-built basic types are integers, booleans, and strings; the language has tuples of arity greater than or equal to 2 (thus we we do not recognize unit). The usual operators on integers and booleans are all implemented. For strings, only the concatenation operator is implemented.

If statements and let statements are implemented, but case statements and pattern matching are not available.

Val and fun declarations are allowed, but only in let statements; only one declaration is allowed in each let statement. Anonymous functions can also be defined. The fun declaration allows for the definition of recursive functions; using the and keywords one can define pairs of mutually recursive functions (but it is not possible to define, say, three or more mutually recursive functions).

It is possible to explicitly raise exceptions in mini-ML, and to define exception handlers.

Here are a few samples of mini-ML (more extensive examples are given in one of the handouts):

MiniML> (1, true)
(1, true)

MiniML> #3 (true, "second", 5)
5

MiniML> 4 + ((raise Fail "oh no") + 5 handle Fail x => 3)
7

MiniML> let
          val f = fn (p: (string -> int) * int) => ((#1 p) "alpha") + (#2 p)
        in
          f(fn (s: string) => 3, 4)
        end
7

MiniML> let
          fun even(n:int):bool = if n = 0
                                 then  true
                                 else  odd(n-1)
          and odd(n:int):bool = if n = 0
                                then  false
                                else even(n-1)
        in
          (even(15),odd(12),even(99),odd(7885))
        end

(false, false, false, true)

Note: We have formatted the mini-ML samples to enhance readability; mini-ML requires all input to be provided on one line, with no semicolon at the end of the line. The only legal mini-ML input is an expression (as opposed to a declaration, see below). Declarations can be embedded in expressions, but can not appear at the top level.

The BNF grammar of mini-ML is given below:

(* unary operators *)
unop  ::= not | ~

(* binary operators *)
binop ::= + | - | * | div | mod | = | <> | > | >= | < | <= | ^ | andalso | orelse

(* declarations *)
d     ::= fun f(x : type) : type = e
        | fun f(x : type) : type = e1 and g(x : type) : type = e2
        | val identifier = e

(* types *)
type  ::= int
        | string
        | bool
        | type1 * type2 * ... * typen
        | type1 -> type2

(* expressions *)
e     ::= integer_constant
	| string_constant
	| identifier
	| true
	| false
	| (e1, e2, ..., en)
	| #integer e
	| fn identifier : type => e
	| if e1 then e2 else e3
	| e1 binop e2
	| let d in e end
	| unop e
	| e1 handle identifier => e2
	| raise identifier e

Strictly speaking, the grammar is incomplete; for example, we did not define the meaning of integer, string, or identifier. This is not a big limitation, as the meaning of these symbols is that usual in SML.

We note that all binary operators have been lumped together; however, we must remember that the short-circuiting semantics of andalso and orelse makes these operators special (they do not necessarily evaluate both their arguments).

The grammar requires that type information be provided in certain expressions (but not in a val declaration!), however, the grammar does not impose any compatibility or consistency requirements on the various types. This grammar allows for expressions like true = (1, "alpha").

High-Level Description of the Interpreter

After compiling the interpreter with the CM.make() command, one launches it using the ReadEvalPrint.main() function call. The ReadEvalPrint function (see file readevalprint.sml) sets up a loop by calling itself recursively. At each call, the function reads a line of text from the input; if the line is not empty, it is parsed. Absent any sytax errors, the result of parsing is an abstract syntax tree representation of the input line (more about this later). The abstract syntax tree is then checked for type errors, if correct, it is subsequently evaluated and the result is diplayed. The "read an expression, (parse, typecheck and) evaluate it, then print the result" loop is then executed again. Errors that occur in the various stages of processing are all handled appropriately (please refer to the source code for details).

The lexical analyzer and the parser are specified in files mini-ml.lex and mini-ml.grm. You do not need to change, nor understand these files, but you should feel free to examine them for your own benefit.

The Abstract Syntax Tree

We are familiar with the notion of abstract syntax trees, as we have encountered them before. The abstract syntax tree of a mini-ML expression can be generated by calling the Parser.parseString command.

- Parser.parseString "(1, true)";
val it = TUPLE_E [INT_E 1,TRUE_E] : AbstractSyntaxTree.exp

- Parser.parseString "#3 (true, \"second\", 5)";
val it = PROJ_E (3,TUPLE_E [TRUE_E,STRING_E #,INT_E #])
  : AbstractSyntaxTree.exp

- Parser.parseString ("let fun even(n:int):bool = if n = 0 then  true " ^ 
                      "else  odd(n-1) and odd(n:int):bool = if n = 0 then " ^
                      "false else even(n-1) in " ^ 
                      "(even(15),odd(12),even(99),odd(7885)) end");
val it =
  LET_E
    (AND_D
       ("even","n",INT_T,BOOL_T,
        IF_E
          (BINOP_E (VAR_E "n",EQ_B,INT_E 0),TRUE_E,
           APP_E (VAR_E "odd",BINOP_E (VAR_E "n",MINUS_B,INT_E 1))),"odd","n",
        INT_T,BOOL_T,
        IF_E
          (BINOP_E (VAR_E "n",EQ_B,INT_E 0),FALSE_E,
           APP_E (VAR_E "even",BINOP_E (VAR_E "n",MINUS_B,INT_E 1)))),
     TUPLE_E
       [APP_E (VAR_E "even",INT_E 15),APP_E (VAR_E "odd",INT_E 12),
        APP_E (VAR_E "even",INT_E 99),APP_E (VAR_E "odd",INT_E 7885)])
  : AbstractSyntaxTree.exp

The types used to represent the abstract syntax tree are described in the structure AbstractSyntaxTree (see file abstractsyntaxtree.sml); the relevant types are binop, unop, exception_name, typ, decl, and exp. These types map almost exactly to the BNF grammar of mini-ML given above.

Expressions and Values

It is important to distinguish between expressions and values. To understand this distinction we start with a simple example, and we assume that the user types 2<enter> at the mini-ML prompt. Here are the various stages of processing that this input is subjected to:

The input is read in as a string, in this case "2".
The string is parsed and converted to an abstract syntax tree (or an expression), in this case INT_E(2).
The abstract syntax tree is type-checked; it is correct.
The abstract syntax tree (i.e. the expression), is evaluated, the result is a value, in this case INT_V(2):
- Evaluator.eval(Parser.parseString "2", AbstractSyntaxTree.empty_env); val it = INT_V 2 : AbstractSyntaxTree.value
The value resulted from the evaluation is converted to a string - see function PrettyPrinter.value2str - and printed.

In this case the input ("2"), the abstract syntax tree (INT_E 2), the resulting value (INT_V 2), and the result (again, string "2") look very similar. Still, it is important to distinguish between user input, the result of the parse phase (always an abstract syntax tree, i.e. an expression), the value that results from the evaluation of the expression, and the form in which the result is printed. Of course, in general the form of an expression will be very different from the form of its corresponding result.

The type that describes values is value and it is defined in structure AbstractSyntaxTree. Datatype constructors are provided for basic types (integers, booleans, and strings), as well as for tuples and closures.

Environments

Mini-ML does not allow for side-effects, and this greatly simplifies the implementation of environments. Our environment will consist of an association list between identifiers and values. The environment type is defined in structure AbstractSyntaxTree as:

type environment = (id * value) list

Note: The actual definition uses withtype and not type; this keyword is necessary to resolve the mutual recursion in the definition of the value and environment types.

The environment will be used essentially like a stack; new bindings will be added to the head of the list, and bindings added later will be removed before bindings added earlier. As we will see the removal of bindings will, in fact, happen implicitly, as an effect of returning from a recursive function call. Special manipulation of the environment will also be necessary to handle recursive and mutually recursive functions.

In the mini-ML example below, which we have reformatted by breaking it up into several lines for readability, we show the evolution of the environment at various points in the program:

MiniML> let
          (* env = AbstractSyntaxTree.empty_env = [] *)
	  val x = 1
          (* env = [("x", INT_V 1)] *)
	in
	  let
           (* env = [("x", INT_V 1)] *)
	    val y = let
                      (* env = [("x", INT_V 1)] *)
		      val x = 3
                      (* env = [("x", INT_V 3), ("x", INT_V 1)] *)
		    in
		      x * x
		    end
           (* env = [("y", INT_V 9), ("x", INT_V 1)] *)
	  in 
	    (x, y * y)
	  end
         (* env = [("x", INT_V 1)] *)
	end
(1, 81)

Note: mini-ML allows for regular SML comments anywhere inside expressions.

The interpreter starts the evaluation of each expression with an empty environment (see function ReadEvalPrint.loop); this means that no predefined functions are known, nor are functions or values defined by previous inputs accessible to the expression specified by the current input line.

There are only two operations that we can explicitly perform with an environment: we can look up an identifier in the current environment (see function AbstractSyntaxTree.env_lookup), or we can expand the current environment with the addition of a new binding (see function AbstractSyntaxTree.env_add). As we mentioned before, the removal of a binding occurs implicitly.

Closures

Closures are representations of functions, in mini-ML they are implemented with the help of the FN_V datatype constructor (see type AbstractSyntaxTree.value). A closure consists of the name of the unique argument (see below), the expression (i.e. abstract syntax tree) defining the function body, and a reference to the environment in which the closure was created. We use a reference to the environment for two reasons:

First, by storing a reference instead of the actual environment, we save space.
Second - and more importantly - the use of references allows for the creation of environments containing closures whose embedded references point to the environment itself. This is important when defining recursive, or mutually recursive functions.

We illustrate the use of references below; we chose trivial recursive functions in order to focus on the main topic - the environment.

- Evaluator.eval(
    Parser.parseString "let fun f (n:int):int = f n in f end",
    AbstractSyntaxTree.empty_env);

val it =
  FN_V
    ("n",APP_E (VAR_E "f",VAR_E "n"),
     ref [("f",FN_V ("n",APP_E (VAR_E "f",VAR_E "n"),%0))] as %0)
  : AbstractSyntaxTree.value

In the example above we have highlighted the FN_V datatype constructor, to make it easier to recognize that the closure associated with f stores a pointer to the very environment that contains the binding for f. As we have discussed when we introduced the environment model, such references are essential to create recursive functions.

Note that a mini-ML function can only have one argument, but the respective argument can be a tuple. Since mini-ML does not have 0-tuples, functions with 0 arguments can not be defined. This limitation can be overcome by defining a dummy argument that is not used in the function body.

Exceptions

Exceptions in mini-ML are meant to convey information about unusual situations that occur during the execution of the program, and that can not be easily handled using normal control statements. Similarly to SML, an exception can be raised using the raise statement: raise ExceptionName "message". The name of the exception must match the name given in a handle statement (see below).

A mini-ML exception can be raised at any point where an expression is allowed; an exception can stand in for a value of any type. For example, if a function expects an argument of type int * int, the tuple (1, raise MiniMLException "bad") is a type-compatible argument to the respective function.

An exception can be caught and handled using the handle statement. The first argument of this statement must match the name of the exception that has been raised, the second argument - an indentifier - will be bound to the string representing the message that the exception carries. Here are a few examples that illustrate the interaction between raise and handle:

MiniML> (raise Fail "will not be caught") handle SomeException s => s
exception Fail with message  "will not be caught"

MiniML> (raise Fail "was caught") handle Fail s => ("Yes, it was " ^ s)
"Yes, it was was caught"

Note: Mini-ML has a parser that is less robust than that of SML. Ocasionally you will find that you need to add parantheses around certain expressions to "convince" the parser to interpret it in a certain way. We illustrate this point by reproducing one of the commands above without the parantheses:

 
MiniML> raise Fail "was caught" handle Fail s => ("Yes, it was " ^ s)
exception Fail with message  "was caught"

This is a "feature" of our interpreter, but not an error. We accept such limitations in the interest of simplicity, in order to be able to focus on the more important issues that we are interested in (e.g. how to make mutually recursive functions possible).

Special Forms

Special forms are similar to mini-ML functions, except that they are predefined, and they have full control over the evaluation of their arguments. Special forms can be thought of as non-standard mini-ML statements.

Consider, for example, the special form nth_eval, which takes k+1 arguments, the first of which must evaluate to an integer n. If n is in the range 1..k, then the special form evaluates its n + 1 argument and returns its value; no other arguments are evaluated. If n is not in the correct range, a run-time exception is raised.

It is easy to see that nth_eval can not be implemented as a function. Consider the following example:

MiniML> let
          fun nth_eval(arg: int * int * int): int = 
             if (#1 arg) = 0
             then (#2 arg)
             else (#3 arg)
        in
          nth_eval(1, 5, raise Fail "exception will be raised")
        end

From the semantics of nth_eval, the result should be 5. However, if nth_eval is implemented as a function, an eager mini-SML interpreter will evaluate all the function arguments before the function call, thus triggering the exception.

A special form has none of these problems, as it has full control over the evaluation of its arguments. A special form can decide to evaluate or not to evaluate any of its arguments, or to evaluate its arguments in an arbitrary environment.

The downside of the flexibility aforded by special forms is that special forms must be hardwired into the interpreter. However, none of the three special forms whose names are recognized by our interpreter are implemented yet, we will later provide implementations for these as exercises.

The Evaluator

The purpose of the evaluator is to convert an expression (abstract syntax tree) into a value. The evaluator is implemented in structure Evaluator (see file evaluator.sml). The evaluator consists of a set of mutually recursive functions: eval_binop, eval_unop, evaluateSpecialForm, and the main evaluation function eval.

Functions eval_binop and eval_unop are both very simple, and very similar; they call function eval to evaluate their arguments in the current environment, then perform the operation corresponding to the operator at hand on the values that have been obtained, and produce a resulting value, which is then returned. We note here that andalso and orelse are technically binary operators, but they are implemented in eval and not in eval_binop. This is because the special short-circuiting semantics of these two operators forbids the evaluation of their second (i.e. right-side) argument unless that is strictly necessary. In contrast, fuction eval_binop unconditionally evaluates both expressions that stand for the arguments of the respective binary operators.

Function eval is, as we stated, the main evaluation function. In general, the evaluation of various expression is straighforward, and needs but few explanations. We will examine in detail function applications and the evaluation of let statements.

The code that implements function applications (function calls) is shown below:

  ...
  | APP_E(e1, e2) =>
    (* call the special form, if it is one *)
    (case expToSpecialFormName e1 of
         SOME s => evaluateSpecialForm(s, e2, env)
       | NONE => 
         (case eval(e1, env) of
              EXCEPTION_V(x, s) => EXCEPTION_V(x, s)
            | FN_V(x, body, closure_env) =>
              (case eval(e2, env) of
                   EXCEPTION_V(x, s) => EXCEPTION_V(x, s)
                 | v2 => eval(body, env_add(x, v2, !closure_env)))
            | _ => raise brokenTypes))
  ...

Function applications have the form e1 e2, where e1 must evaluate to a closure. Since calls to special forms look like function applications, we first check whether we are dealing with a special form (i.e. if e1 is the name of a special form). If yes, the function handling special forms is called. Otherwise e1 is evaluated in the current environment. If no exception is raised during the evaluation and e1 evaluates to a closure, then e2 is evaluated. If no exception results, the closure's body is evaluated. The environment in which the body is evaluated is obtained by retrieving the reference to the environment in which the closure was defined, deferencing it, and extending the environment with a binding that binds the (unique) formal argument to the value of e2. The evaluation occurs as a recursive call to eval using the body of the closure and the extended environment.

We take this opportunity to examine how mini-ML exception values are handled. Any time the evaluation of a subexpression yields an exception, the exception value is passed on (in fact: returned) immediately as the value that corresponds to the entire current expression being evaluated. The only exception to this the evaluation of a handle statement.

The handling of a val declaration (which can only occur embedded in a let statement) is straighforward - the current environment is extended with a binding for the newly declared identifier, and the extended environment is then used to evaluate the expression between the in and end parts of the respective let.

Function declarations using fun must also be embedded in let tatements. The code that implements their evaluation is shown below:

  ...
  | LET_E(FUN_D(fnname, x, _, _, e1), e2) =>
      let
        val hole = ref []
        val env_with_fn_def = env_add(fnname, FN_V(x, e1, hole), env)
        val () = hole := env_with_fn_def
      in
        eval(e2, env_with_fn_def)
      end
  ...

Note how a closure is created by inserting into it a dummy reference to an empty environment (the hole). After the current environment is expanded with a binding that includes the closure, the reference in the closure is updated to point to the expanded environment. As we have discussed when presenting the environment model, this is the key step for defining recursive functions.

The definition of mutually recursive functions is similar, except that the environment must be extended twice (to include both newly defined functions) before the references in the closure can be reset to the extended environment.

The evaluation of the other expressions is straightforward, but not always trivial. In the interest of brevity, we do not expand on all expressions here; however you must read and understand the code thoroughly. As the same ideas are applied again and again, the code contains a lot of repetitions, which should greatly ease your understanding.

Lecture 24: An ML Interpreter: Putting Everything Together