CS212 S99 Problem Set 6

CS212 Problem Sets
Problem Set 6 - Spring 1999
Interpretation and Compilation

Issued: Wednesday, April 21
Due: Monday, May 10, no later than 5pm

Introduction
Part I
Part II
Part III
Part IV
Part V

Introduction

The goal of this problem set is to turn the Tiny Scheme implementation, discussed in class, into a Small Scheme and eventually Medium Scheme implementation. In so doing, you will learn about the structure of an interpreter, the structure of a compiler, and some other aspects of language implementation. In particular, you will augment the provided interpreter and compiler to support many special forms such as letrec, cond, let*, delay, etc. You will also build a primitive debugger for the interpreter. Finally, you will augment the compiler to provide some primitive object support in a message-passing style.

The Tiny Scheme language is a small, core subset of Scheme. The semantics of this core is exactly the same as for Scheme. The same will hold for any language constructs you add to the implementation. So, if you are in doubt about how something should behave, read the R5RS documentation and play with DrSwindle to see how things should evaluate. Also, refer to the notes on the Environment Model.

Clearly, this problem set involves a lot and as always, you want to get started on it as soon as possible. Also, make sure you read and understand all of the code supplied to you. Think before you start coding and your life will be a lot simpler.

PART I: Adding Special Forms to the Interpreter

Your first step is to augment the Tiny-Scheme interpreter so that it supports more language constructs. In particular, you are to add methods to ts:eval-combination to support the following constructs:

(display e), (newline)
display should take any value and display the value. newline should print a "carriage-return". These routines are useful for debugging and demonstrating that subsequent routines work. HINT: adding support for these is very easy.
(set-car! e1 e2) and (set-cdr! e1 e2)
Within the interpreter, you may assume that set-car! and set-cdr! are always applied to two sub-expressions. The interpreter should evaluate both sub-expressions. If the first is not a pair, then an error should be signalled. Otherwise, the car or cdr field should be updated with the value of the second expression as appropriate. HINT: adding support for these is very easy.
(begin e1 e2 ... en)
Within the interpreter, you may assume that begin is given at least one sub-expression. The behavior should be exactly as in Scheme.
(letrec ((x1 e1) (x2 e2) ... (xn en)) e)
Within the interpreter, you may assume that the list of bindings is non-empty (i.e., n > 0) and that the left-hand-sides are symbols. You may also assume that the body of the letrec is one expression instead of the more generalized form of Scheme. The behavior should be exactly as in Scheme and as described in the Environment Model.
(quote e)
Within the interpreter, you may assume that quote is applied to exactly one sub-expression. The semantics of quote dictates that you not evaluate e to a value v. Rather, (quote e) evaluates as follows: for all expressions e, (quote e) evaluates to e. This is if e is a number, symbol, or string (it evaluates to that number, symbol, or string) and also if it is a list expression. If in doubt, try evaluating something in Scheme to see what it gives.
(delay e) and (force e)
To support delay and force, we must add three new kinds of object to the evaluator: <promise>s, <thunk-closure>s, and <thunk-value>s. A promise is an object with one field called the contents. The contents of a promise can be either a thunk-closure or a thunk-value. A thunk-closure is just like a closure except that it takes no arguments when it is applied. A thunk-value is an object with a value field. The value field can be any kind of Tiny Scheme value.

Evaluation of (delay e) creates a new promise object. The contents field of the promise should be initialized with a thunk-closure that takes no arguments, has body e, and the current environment as its environment.

Evaluation of (force e) evaluates the expression e and checks that the resulting value is a promise object. If the contents of the promise is a thunk-value, then the value of the thunk-value is immediately returned as the result of the expression. If the contents of the container is a thunk-closure, then the closure should be applied to get a value. This value should be placed in a thunk-value and the contents of the promise should be overwritten with the new thunk-value. Finally, the value should be returned. Clearly, you'll have to define classes for <promise>, <thunk-value>, and <thunk-closure> as part of this problem, so please turn in these definitions along with your additions to ts:eval-combination.
Here is a sample dialogue in Scheme:
```
  ==> (define z (delay (begin (print "Hello!") (newline) 3)))
  ==> z
  #<promise>
  ==> (force z)
  Hello!
  3
  ==> (force z)
  3
```
Notice that the delayed expression is not evaluated immediately. Rather, we return a "promise" to evaluate the expression if it is every forced. Because we "memoize" the value within the promise, we do not have to re-evaluate the expression if the promise is forced more than once.

WHAT TO TURN IN: You should turn in your methods for ts:eval-combination. You should also turn in a copy of a dialogue with the augmented interpreter demonstrating that your additions work. You only need to provide one or two examples per construct.

PART II: Adding Special Forms to the Compiler

The function ts:read gets called as part of the Tiny Scheme read-eval-print loop to read in an expression. In turn, ts:read calls read to get an S-expression representing a Medium Scheme program, and then calls ts:compile to translate the Medium Scheme program to something that the Tiny Scheme evaluator can handle. In addition, the compiler is supposed to check that the program is well-formed. For instance, we should not have a lambda-expression of the form (lambda (3) (+ 2 1)) because 3 is not a valid variable (i.e., a symbol). However, the current compiler is rather primitive and does not do any of these checks.

Your next step is to augment the compiler so that (1) expressions are checked for silly errors and (2) more expressions are supported through compilation to Tiny-Scheme expressions. As an example, we have added a method to compile let-expressions into applications of lambdas as described in class. The compiler also checks that the let-expression is well-formed and compiles all of the sub-expressions.

You are to add methods to ts:compile-combination to support:

error checking
Add any methods to the compiler necessary to make sure that expressions are well-formed. In general, you should reject any expression, such as (lambda (3) (+ 1 2)), that would be rejected by DrSwindle. Furthermore, you should make sure that using ts:eval will not cause a DrSwindle error to occur, except through a call to the ts:eval-error function. You should not need to modify our code to achieve this. Furthermore, all subsequent additions to the compiler should follow the same general principle and attempt to eliminate as many erroneous expressions as possible.
(define (f x1 x2 ... xn) exp1 exp2 ... expm)
Though Tiny-Scheme provides support for define-expressions of the form (define x e), it does not provide the above convenient abbreviation for definining functions.
(let* ((x1 e1) (x2 e2) ... (xn en)) exp1 exp2 ... expm)
The Tiny-Scheme interpreter does not support let*. You should compile them to some equivalent Tiny-Scheme expression and check that they are well-formed. In particular, the let* should only bind variables and the body should have at least one expression (see the next bullet also.)
Multiple expressions within bodies of (lambda ...), (let ...), (letrec ...), (let* ...), and (define ...)
The Tiny-Scheme interpreter is written such that the bodies of lambdas, lets, etc. can only have one sub-expression. Add support for multiple sub-expressions within the bodies of these special form by compiling them to begin-expressions.
(and exp1 exp2 ... expm), (or exp1 exp2 ... expm)
and (cond (test1 exp1) (test2 exp2) ... (testn expn) (else exp))
Tiny-Scheme does not directly provide and-expressions, or-expressions, or cond-expressions so you'll have to compile them down to something else. Note that for both and and or special-forms, the number of sub-expressions can be any number greater than or equal to 0. Note that for cond-expressions, the "else" clause is optional.
(cons-stream e1 e2), (heads e), (heads e)
You should provide support for streams by adding methods to compile the above expression forms. (cons-stream e1 e2) should evaluate e1 to a value v1, create a promise to evaluate e2, and produce a pair of v1 and the promise. (heads e) should return the first component of a pair (just like head). (tails e) should force the promise that is in the second component of the pair.

WHAT TO TURN IN: You should turn in your methods for ts:compile-combination. You should also turn in a copy of a dialogue with the augmented compiler demonstrating that your additions work. (Hint: set *ts:verbose* to #t to see what the compiler is producing before it is fed into the interpreter.) You only need to provide one or two examples per construct.

PART III: Adding a Debugger

In this part of the problem set, you will add a simple debugger. Whenever an error is encountered during evaluation, the function ts:error is called with the current environment. In turn, ts:error calls a function ts:debugger, passing it the environment. Modify the ts:debugger definition to implement a simple debugger. The debugger should have its own read-eval-print loop that allows the user to examine the frames and bindings of the current environment in an attempt to find and correct the error. The commands the debugger should support are as follows:

(frames) lists the frames up to the global one (as numbers, with the current frame as number 0)
(list i) lists the variables bound in frame i
(show i x) shows the value of variable x in frame i
(replace! i x e) compiles e to a Tiny Scheme expression, evaluates it in the current environment to a value v, and sets variable x in frame i to have value v.
(resume e) compiles e to a Tiny Scheme expression, evaluates it in the current environment to a value v, and returns v back into the evaluator at the point where the error occurred.
(exit) exits the debugger and returns the user back to the Tiny Scheme read-eval-print loop.

Here is an example dialogue:

TS==> (define x (let ((x 3) (z 4)) (+ x y)))
Error! y is an unbound variable!
Entering Debugger
Debug==> (frames)
0
1
Debug==> (list 0)
x z
Debug==> (list 1)
+ - * / = eq? not empty cons empty? head tail ...
Debug==> (show 0 x)
x : 3
Debug==> (show 1 +)
+ : #<primitive:+>
Debug==> (replace! 0 x (+ x x))
Debug==> (show 0 x)
x : 6
Debug==> (resume x)
Exiting Debugger -- resuming calculation
9
TS==> (define z q)
Error! q is an unbound variable!
Entering Debugger
Debug==> (exit)
Exiting Debugger -- resuming in read-eval-print loop
TS==>

WHAT TO TURN IN: All of the code that pertains to the debugger and an example transcript showing that each of the commands works as described above.

PART IV: Adding Support for Message-Based Objects

In the last part of this problem set, we will add some object support in the style of Smalltalk to Tiny Scheme. In particular, we will compile object constructs down to a "message-based" system. The object system is extremely simple and is similar to the primitive (defstruct ...) mechanism described in class. However, the implementation is far different.

In Tiny Scheme, new classes may only be defined using the form:

(defstruct class field1 field2 ... fieldn)

Classes may not inherit from other classes and classes may only be defined at the top-level. Class names and field names can be any symbol. That is, you do not have to ensure that class-names have "<" and ">" wrapped around them.

When a defstruct like the one above is evaluated, the user is supplied with a new set of functions:

(make-class e1 e2 ... en) -- creates a new class object with e1,e2,...,en as the initial values of the fields.
(class? e) -- a predicate that returns #t if e evaluates to an object of type class, otherwise false.
(get-class-field e) -- if e is a class object, returns the field.
(set-class-field! e1 e2) -- if e1 is a class object, sets its field to have value e2.

The way you are going to implement defstruct is to compile it to a sequence of definitions wrapped with a begin. The most important definition is for make-class. This definition will be a function which, when given the initial values for the fields, returns a function that takes a message as a symbol. Thus, an object will simply be a function. Based upon what the message is, the object will respond appropriately. For instance, if the object is sent the message 'get-field then it will return the value of the field field. The other definitions, for the predicate, getters, and setters, will be set up to send the appropriate messages to an object.

For example, here is how a simple point struct should be compiled -- you should generalize from this. If the defstruct is of the form:

(defstruct point x y)

then this should be translated to something equivalent to:

  (begin
    (define (make-point x y)
       (lambda (msg)
         (cond ((eq? msg 'get-x) x)
               ((eq? msg 'get-y) y)
               ((eq? msg 'set-x) (lambda (v) (set! x v)))
               ((eq? msg 'set-y) (lambda (v) (set! y v)))
               ((eq? msg 'predicate) (lambda (type) (eq? type 'point)))
               (else (error "unknown message")))))
    (define (get-point-x obj) (obj 'get-x))
    (define (get-point-y obj) (obj 'get-y))
    (define (set-point-x! obj v) ((obj 'set-x) v))
    (define (set-point-y! obj v) ((obj 'set-y) v))
    (define (point? obj) ((obj 'predicate) 'point)))

Notice that when we make a point, this returns a function. The function takes in an understands only certain kinds of messages (as symbols). In particular, it understands get and set messages for its fields, and the predicate message. For get messages, it just returns the value of the appropriate field. For set messages, it returns a function which, when given a value, sets the appropriate field to that value. For predicate messages, it takes in a type name (as a symbol) and returns true when the type is equal to the name of the class. Notice that an object should be able to handle any predicate message but should only return true when the type passed is its own type. After defining the object, we define the auxiliary functions for getting, setting, and testing the type of the object in terms of message passing.

One handy function that you'll need is symbol-append. This takes in two symbols and returns a new symbol made by concatenating the two symbols. So, for instance, (symbol-append 'get- 'point) evaluates to 'get-point (note that add is a generic function that can also be used).

WHAT TO TURN IN: You should add a new method to ts:compile-combination that implements defstruct as described above. Turn in all code related to defstruct and a transcript showing that it works.

Part V: Written Exercises -- Optimization

We could augment the compiler to optimize certain expressions before attempting to interpret them. For example, we could replace expressions of the form (if #f e1 e2) with e2. Unfortunately, in the presence of set!, it is not so easy to write an optimizer (as you will see). For each question below, we describe an "optimization" as a rewriting rule of the form e1 => e2. Note that, within an optimizer, the rewriting rule is applied to all sub-expressions in a program that fit the constraints specified.

You are to determine whether or not applying the optimization to each sub-expression is meaning-preserving. An optimization is meaning-preserving if it transforms one program into another program and the two programs have the same observable behavior. Observations include non-termination, printing, errors, the result value, etc. If the optimization is not meaning-preserving, then give an example that demonstrates how code behaves differently before and after the optimization is applied.

(+ c1 c2) => c3 where c1 and c2 are constant numbers and c3 is (+ c1 c2).
((lambda () e)) => e
((lambda (x) e) 3) => e[3/x] where the notation e[3/x] means substitute 3 for all free occurrences of the variable x within the expression e.
(let ((x e1)) e2) => e2 when e2 has no free occurrences of x.

Last Updated 4/18/99 by JGM