# A4: OCalf
**Soft deadline:** Thursday, 10/29/15, 11:59 pm
**Hard deadline:** Saturday, 10/31/15, 11:59 pm
*This assignment may be done as an individual or with one
partner. Sharing of code is permitted only between you and your partner;
sharing among larger groups is prohibited.*
A baby camel is called a "calf."
In this assignment you will implement an interpreter
for OCalf, a functional language containing many of the
core features of OCaml.
## Overview
Here is a Backus-Naur form (BNF) grammar for the syntax of OCalf expressions,
to give you a sense of what features are in the language:
```
e ::= (* expressions *)
| () | i | b | s | x
| e1 bop e2
| if e1 then e2 else e3
| let [rec] x = e1 in e2
| e1 e2
| fun x -> e
| (e1,e2)
| C e
| match e with p1 -> e1 | ... | pn -> en
bop ::= (* binary operators *)
| + | - | * | > | < | = | >= | <= | <> | ^
p ::= (* patterns *)
| () | i | b | s | x | C p | (p1,p2)
i ::= integers
b ::= booleans
s ::= strings
x ::= variable identifiers
C ::= constructor identifiers
```
Some of the ways that OCalf differs from OCaml are the following:
* All OCalf variants must be non-constant.
* OCalf has pairs, but not \\(k\\)-tuples for \\(k \geq 3\\).
* OCalf does not have built-in (syntactic support for) lists.
* OCalf does not have a module system.
* OCalf has no imperative features.
Some of the OCalf interpreter is already implemented for you.
Your task is to implement evaluation and type inference, as well
as to build a test suite that you will submit. You will need
to understand the big-step environment model of OCaml to complete
the interpreter. For karma, you can implement many extensions to
the interpreter.
## Objectives
* Know the design of an interpreter and REPL.
* Implement a big-step environment model semantics, including closures.
* Better understand OCaml itself by implementing many of its main features.
* Comprehend the important building blocks of type inference, and implement
one of them.
* Formulate a test suite that can detect errors in implementations other than
your own.
## Recommended reading
* [Lectures and recitations 12–15][web]
* [Recitation 10 (Testing)][web]
* The [CS 3110 Style Guide][style]
[web]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/
[style]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/handouts/style.html
## Requirements
* You must implement the evaluation of OCalf expressions, as described
below under Part 1.
* You must submit a test suite for evaluation, as described below under Part 2.
* You must implement the type inference and checking of OCalf expressions,
as described below under Part 3.
* We must be able to compile and run your interpreter as described below under
"Compiling and running".
* Your code must be written with good style and be well documented.
* You must use Git (or another version control system) as part of your development
process.
## What we provide
In the release code you will find these files:
* Many `.mli` and `.ml` files, which are described below under
Part 0.
* A template file `a4.txt` for submitting your written feedback.
## What to turn in
Submit files with these names on [CMS][cms]:
* `a4src.zip`, containing your solution code and your test suite.
* `vclog.txt` containing your version control log.
* `a4.txt`, containing your written feedback.
[cms]: https://cms.csuglab.cornell.edu/
**To prepare `a4src.zip` for submission:**
From the directory that contains `main.ml`, bundle all your source code
and your test suite into a zip file with this command:
```
$ zip a4src.zip *.ml{,i,y,l}
```
Do not include any compiled bytecode files, otherwise your submission
might become too big to upload.
Double check that you got all your files with this command:
```
$ zipinfo -1 a4src.zip
```
**To prepare `vclog.txt` for submission:** Run the following
command in the directory containing `main.ml`:
```
$ git log --stat > vclog.txt
```
## Git
You are required to use [Git][git] (or another version control system)
whether you are working as an individual or
with a partner. Throughout your development of
A4, commit your changes to a repo. Use those checkins
to provide checkpoints, in case you need to restore
your development to a previous point.
Synch with a remote repo to communicate code between you
and your partner, or simply to backup your development
if you are working as an individual.
**Private repos are of the utmost importance.
A public repo would share your code with the entire world, including your
classmates, thus violating the course policy on academic integrity.
Therefore we require that you keep all your CS 3110 related code in private repos.**
[git]: https://git-scm.com/
## Grading issues
* **Compiling and running:** Your interpreter must compile
according to the instructions given below in Part 0 under
"Compiling and running". You may not change the names and
types appearing in `.mli` files. Solutions that do not
obey these stipulations will receive minimal credit.
* **Code style:** Refer to the [CS 3110 style guide][style].
Ugly code that is functionally correct will nonetheless be penalized.
Take extra time to think and find elegant solutions.
* **Late submissions:** Carefully review the [course policies][syllabus] on
submission and late assignments. Verify before the deadline on CMS
that you have submitted the correct version.
* **Environment:** Your solution must function properly in the
[3110 virtual machine][vm], which is the official grading
environment for the course.
[style]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/handouts/style.html
[syllabus]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/syllabus.php
[vm]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/vm.html
## Permitted OCaml features
Imperative data structures are now permitted. Side effects are not necessarily
bad. But we urge you to think carefully before choosing
to use imperative data structures.
## Part 0: Understand the OCalf codebase
Your preliminary task is to familiarize yourself with the structure of the
code we have shipped to you. We provide the
following modules, comprising both `.ml` and `.mli` files, in the release code:
* `Eval` and `Infer` have skeleton code for the functions that implement
evaluation and type inference. There are some unimplemented helper functions
in these files that the course staff found helpful in their own solution.
You are free to implement them, change them, or remove them
entirely.
* `Ast` and `TypedAst` contain the type definitions used in evaluation and
type inference, respectively. The `TypedAst` module also contains `annotate`
and `strip` functions for converting between the two.
* `Examples` and `Lambda` contain example OCalf expressions.
`Examples` contains many small examples from this writeup.
`Lambda` contains a large example.
* `Printer` contains functions for printing out values of various
types.
* `Parse` contains functions to construct ASTs from strings. It relies on
a lexer and parser implemented in `lexer.mll` and `parser.mly`.
**Exercise:** Skim each of the `.mli` and `.ml` files in the release code,
and plan to come back and read them in more detail later as necessary.
**Compiling and running:**
From the directory containing `main.ml`, run
```
$ cs3110 compile main.ml
```
to compile the interpreter. As the parser is compiled, it will produce a
message `1 shift/reduce conflict`. This is expected behavior.
To run the intepreter, we will use OCaml's own REPL. (For karma,
you can build a REPL for OCalf.) So after compiling, load utop.
You can use the OCalf interpreter by directly calling functions
that implement it. For example,
```
"if true then 3110 else 0"
|> parse_expr
|> eval []
|> string_of_value
```
causes the string `"if true then 3110 else 0"` representing an OCalf expression to be
parsed into an OCalf AST, evaluated to an OCalf value, then converted to an OCaml string
suitable for printing. (But note that evaluation won't yet succeed with the release
code, because `eval` is unimplemented.)
Note that the reason functions like `eval` are available in utop
is that we provide an `.ocamlinit` file in the release code.*
When utop starts, it automatically `#use`s this file, which
automatically `#load`s and `open`s many of the modules
from the release code for your convenience in interactive testing.
Whenever you find yourself typing the same thing into the REPL more than once,
consider adding it to your `.ocamlinit`!
\* Files whose names start with `.` are *hidden*;
use `ls -A` to show a directory listing that includes hidden
files.
## Part 1: Evaluation
Implement the function
```
eval : Eval.environment -> Ast.expr -> Eval.value
```
in `eval.ml`. This function evaluates OCalf expressions in the
big-step environment model semantics. For example,
```
eval [] (parse_expr "if false then 3 + 5 else 3 * 5"));;
- : value = VInt 15
```
(Note that we expose the representation of environments as association lists.
Arguably this is poor design: all the client of an environment needs to know
is that it is a dictionary, and the particular dictionary representation
is irrelevant.)
Whenever evaluation reaches a place where the semantics gets
*stuck*, meaning that evaluation could not meaningfully
proceed but the expression being evaluated is not yet a
value, `eval` should produce the special value `VError`. For
example, `3 + true` and `match 3 with | 1 -> false` should
both evaluate to `VError`.
Although you are free to tackle the implementation of `eval`
in any way you see fit, you are strongly encouraged to follow
the plan outlined below. Our test cases will proceed in the plan's
order, so following the plan will maximize your chances of
partial credit.
**Step 0:** Remind yourself of the big-step semantics of OCaml,
because you are essentially implementing that judgment.
**Step 1: Implement eval without LetRec or Match.**
Implement `eval` for unit, integers, Booleans, strings,
`BinOp`, `If`, `Var`, `Fun`, `Pair`, `Variant`, `App`, and `Let`.
**Step 2: Implement LetRec.**
Implement `eval` for `LetRec`. This is tricky, because a binding is needed in
the environment for the defined name before its definition can be evaluated.
We'll solve this problem with a technique called *backpatching*.
To evaluate `let rec f = e1 in e2` using backpatching,
first evaluate `e1` to a value `v1` in an environment where `f` is bound to
a *dummy value*, which may be any value at all.
(If the value of `f` is used while evaluating `e1`, it is an
error. This would occur, for example, if the programmer wrote
`let rec x = 3 + x in ...`.) Then imperatively update the
binding of `f` to `v1`. This "ties the
knot," allowing `v1` to refer to itself. Finally evaluate
`e2` in the environment where `f` is bound to `v1`.
To support backpatching, the `environment` type contains
`binding ref`'s instead of `binding`'s, thus enabling bindings
to be mutated.
**Step 3: Implement Match.**
Implement `eval` for `Match`. You do not need to check whether pattern
matching is exhaustive or whether there are unused match cases.
Return `VError` if pattern matching fails at run time.
## Part 2: Test suite
As part of developing `eval`, you naturally will be constructing test
cases that demonstrate the (in)correctness of your implementation. Let's
take that one step further. The course staff will develop many buggy
implementations of `eval`. Your task is to construct a suite of test
cases that finds all our bugs.
Use `test_eval.ml` from the release code as a template. Add your test suite to the file.
We will copy that file (and only that file) into each of our buggy interpreters, then
run
```
$ cs3110 compile test_eval.ml
$ cs3110 test test_eval.ml
```
We will examine the output to see whether your test suite correctly detects the
bugs in our interpreters.
## Part 3: Type inference
OCalf's type inference algorithm proceeds as follows:
1. Each node of the AST is *annotated* (aka *decorated*) with a unique type variable.
2. The AST is traversed to *collect* a set of equations between types
that must hold for the expression to be well-typed.
3. Finally, those equations are solved to find an assignment of types
to the original variables; this step is called *unification*.
If the unification phase discovers that the system is unsatisfiable,
then it is impossible to give a type to the expression, so a type error would be reported.
If the system is underconstrained, then there will be some
"leftover" variables after unification. The typechecker would give them
user-friendly names like `'a` and `'b`.
* * *
**Example.** Consider `(fun x -> 3 + x)`. The annotation phase would add a
new type variable to each subexpression:
Next, the collection phase would collect equations from each node:
* We know that `(fun x -> e) : t1 -> t2` if `e:t2 `under the assumption
`x:t1`. Stated differently, `(fun (x:t1) -> e:t2) : t3` if and only
if `t3 = t1 -> t2`. Therefore, the equation `'t05 = 't04 -> 't03` would be
collected from `fun` node.
* Similarly, the equations `'t03 = int`, `'t02 = int`, and `'t01 = int` would
be collected from the `+` node.
* The equation `'t02 = int` would be collected from the `3` node.
* We know that if `x` had type `t` when it was
bound then it has type `t` when it is used. Therefore, `'t01 = 't04`
would be collected from the `x: 't01` node.
Finally, the system of equations is solved in the unification phase, assigning
`int` to `'t01` through `'t04`, and `int -> int` to `'t05`.
* * *
We provide the annotation and unification phases of type inference
for you; your task is to implement the function
```
collect : variant_spec list -> annotated_expr -> equation list
```
in `infer.ml`. We strongly encourage you to follow this plan:
**Step 1: Implement collect without Match or Variant.**
Implement `collect` for all syntactic forms except `Variant` and `Match`.
**Step 2: Partially implement collect for Match.**
Implement `collect` for `Match`, but omit handling of `PVariant` patterns.
Make sure the bindings from the patterns are available while checking the bodies
of the cases.
**Step 3: Implement collect for Variant.**
Extend `collect` to handle variants.
Deriving the correct constraints for variants is tricky.
Consider this code:
```
type 'a list = Cons of 'a * 'a list | Nil of unit
let x = Cons(1,Nil ()) in
let y = Cons(true,Nil ()) in
42
```
An overly-strict implementation of collection might report a type error,
if collection generates constraints that force the separate occurrences of `Nil` and `Cons`
in binding `x` and `y` to have the same types.
A better implementation would generate constraints with distinct type variables,
thus permitting the code above.
## Karma
Implementing some of these karma problems might require changing the names or
types that appear in the `.mil` files in the release code. Do so with care.
Adding a new function to an interface file will not break our testing harness,
but changing the name or type of a previously existing function could, especially
`eval` and `collect`. Likewise, extending the AST types will not break our
testing harness, but changing the name of an existing constructor or the data
it carries would. If in doubt, please ask!
**REPL:**
Implement an OCalf REPL. Like utop, your REPL could provide various
directives, syntax highlighting, tab completion, etc.
**Pervasives:**
In the same way that OCaml's `Pervasives` module is automatically opened,
automatically populate the initial OCalf environment with useful functions.
A good place to start would be functions for I/O (e.g., the
`print_* and read_*` functions in OCaml's `Pervasives`).
You will need to extend the OCalf notion of values to include functions
that are provided by OCaml, rather than implemented in OCalf.
In this, you could be inspired by OCaml's [external definitions][external].
Instead of externals naming C functions, though, OCalf externals could name
OCaml functions.
[external]: http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#external-declaration
**Lists:**
Add syntactic support for lists to the OCalf lexer and parser, along with a
built-in `list` type constructor. Implement a collection of functions
similar to the OCaml `List` module in OCalf.
**let-polymorphism:**
**Caution:** This karma problem is likely to cause you to reorganize your code.
Make sure you make good use of source control, helper functions, and unit tests
to ensure that you don't break your existing code while working on it!
The type inference algorithm described above allows us to give expressions
polymorphic types: `fun x -> x` will be given the type `'a -> 'a`.
But it does not let us use them polymorphically. Consider the following
expression:
```
let (any:t1) = (fun (x:t3) -> (x:t4)):t2 in (any 1, any "where")
```
OCaml will give this expression the type `int * string`, but our naive
inference algorithm fails.
The solution is *let-polymorphism*:
Every time a variable defined by a `let`
expression is used, we should create a new type variable corresponding to the use. We
use the constraints generated while type checking the body of the `let` as
a template for a new set of constraints on the new variable.
In the above example, while type checking the body of the let expression, we
will generate the constraints `t1 = t2`, `t2 = t3->t4`, and
`t3 = t4`. While checking the expression `(any:t5) 1`, we can
recreate these constraints with new type variables `t1'`, `t2'`, and
so on. We will output the constraints `t1'=t2'`, `t2' = t3'->t4'` and
`t3'=t4'`. We also will output the constraint `t5=t1'`.
Because the type variables are cloned, they are free to vary independently.
Because we have cloned their constraints, they must be used in a manner that is
consistent with their definition.
**Meta:**
Implement an OCalf interpreter in OCalf.
Start by defining an OCalf variant to represent OCalf ASTs.
Then implement an `eval` function in OCalf to evaluate
OCalf expressions. Since OCalf doesn't have imperative
features, you won't be able to implement `let rec` with
backpatching. Instead, you could use a technique
called *recursive closures*, which is described in
the [final two slides of a 3110 lecture from Spring 2015][rec-cl].
Note those slides use a different notation than we used this semester.
[rec-cl]: http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/8/lec08.pdf