Problem Set 4: Code transformations and Induction
Due on October 27, 2005 at 11:59 pm.
Instructions
You will submit a total of four separate files:
part1.sml , part2.sml ,
part3.sml, and part4.txt,
Modify the .sml template files from
ps4source.zip.
As always, don't change function names, and remember:
non-compiling code will receive an automatic zero.
Do NOT compress the files
into a zip file. You must submit each file individually using CMS.
You have the option to
select a partner for this assignment. Only groups of at
most two persons are permitted. You can register your
group through CMS. Choose your partner carefully; it is
not an acceptable excuse not to submit a finished homework
because "the other partner did not work enough."
All your files should have a comment at the top with the
name, netID, and section of both students.
Preamble: Mini-ML
We have introduced the substitution model in class in order to provide an
execution model for a carefully chosen subset of SML. In
this problem we give you a SML interpreter that implements the
substitution model as we specified it in class. Parts 1,
2 and 3 are focused on extending the interpreter with several code
transformations.
The Mini-ML Language
We will focus on a small functional language specified by the
following grammar:
e ::= c (* constants *)
| id (* variables *)
| (fn id => e) (* anonymous functions *)
| e1(e2) (* function applications *)
| u e (* unary operations, ~, not, etc. *)
| e1 b e2 (* binary operations, +,*,etc. *)
| (if e then e1 else e2) (* if expressions *)
| let d in e end (* let expressions *)
| (fun f x => e) (* fun expressions *)
|
Declarations:
d ::= val id = e (* value declarations *)
| fun id(id1) = e (* function declarations *)
|
Note that Mini-ML is not a proper subset of SML, since SML
does not have fun expressions. Also, note that our language has neither case
statements, nor pattern matching. All our functions have exactly one argument,
and all let statements contain exactly one declaration.
To simplify the implementation, we will assume that the language has only two
basic types, integers and booleans. No other types can be defined, also, no type
specifications are accepted anywhere in the program.
The Abstract Syntax Tree (AST)
One of the basic problems we need to address is to find a suitable program
representation that can serve as input for our interpreter. One possibility is
to use plain-text representation. While most convenient for the user, such a
representation is not very suitable for direct evaluation. The conversion to a
more convenient representation could be implemented in SML, and it
raises many interesting questions (which are addressed in other CS courses).
These representation conversion issues are ancillary to our goal in this
problem, thus, we will represent our programs as if the conversion step has
already taken place. The representation we choose is named the abstract syntax
tree (AST) and it is based on the following type definitions:
datatype decl = VAL_D of string * expr
| FUN_D of string * string * expr
and expr = INT of int
| BOOL of bool
| FN of string * expr
| ID_E of string
| FAPP_E of expr * expr
| FUN_E of string * string * expr (* name * arg * body *)
| LET_E of decl * expr
| IF_E of expr * expr * expr
| BINOP_E of expr * string * expr
| UNOP_E of string * expr
|
First, note the and keyword that links the two datatype definitions. This
keyword makes it explicit that these two type definitions are mutually recursive
(i.e. type decl refers to type expr in its definition, and
vice-versa). Such simultaneous declarations are necessary for mutually recursive
functions as well.
We use "_D" to mark a datatype constructor for declarations and
"_E" to mark a datatype constructor for expressions. Note that
INT, BOOL, and FN do not have any special marking.
This is to emphasize their special status as both expressions and values. We
could have defined "expression" constants and "value"
constants, for example, and you should know that this is often done. Because of
the peculiar features of Mini-ML, however, it turns out that our
representation is more convenient. Make sure you do not get confused by the fact
that our "values" really are, according to their type, expressions.
Only integers, booleans, and anonymous functions can be values (i.e. results of
an evaluation).
The abstract syntax tree represents the essential structure of the program,
which is stripped by all unnecessary details. Here are a number of val
definitions in SML that encode Mini-ML programs:
val e_3 = INT 3
val e_let = LET_E(VAL_D("x", INT 3),
BINOP_E(ID_E "x", "+", INT 7))
val e_fn = FAPP_E(FN("x", BINOP_E(ID_E "x", "+", INT 25)), INT 7)
val e_fun = FUN_E("function", "x", ID_E "x")
val e_fact = LET_E(FUN_D("fact",
"n",
IF_E(BINOP_E(ID_E "n", "=", INT 0),
INT 1,
BINOP_E(ID_E "n",
"*",
FAPP_E(ID_E "fact",
BINOP_E(ID_E "n",
"-",
INT 1))))),
FAPP_E(ID_E "fact",
INT 3))
|
Mini-ML Interpreter
We have provided a file interpreter.sml that implements an
interpreter of Mini-ML. In that file you will find
implementations of unary and binary operators, substitutions and
evaluation of expressions. This code is provided to help you in two
respects: to get familiar with manipulations over the AST, and to be
able to debug your own assignment. Here are a few
examples of evaluations using the expressions defined
above:
- eval(e_3)
val it = INT 3 : expr
- eval e_let;
val it = INT 10 : expr
- eval e_fn;
val it = INT 32 : expr
- eval e_fun;
val it = FN ("x", ID_E "x") : expr
- eval e_fact;
val it = INT 6 : expr
|
The interpreter.sml file also includes the functions
printDecl and printExpr that print a declaration and
an expression, respectively. For example:
- printExpr(e_fact) "";
let
fun fact (n) = if (n = 0) then 1 else (n * (fact (n - 1) ))
in
fact 3
end
val it = () : unit
|
The file parser.sml provides functions parse and
parseString which you can use to generate a Mini-ML AST from
Mini-ML code text. Feel free to use these, but have in mind that the
parser is more strict than in standard SML (e.g. arguments to
functions cannot be specified without parenthesis). We have included some
examples in file include.sml for your convenience.
For example, you can write a Mini-ML expression in a file
"a.sml", then do the following:
- CM.make();
- open Interpreter;
- case (Parser.parse (TextIO.openIn "a.sml")) of
NONE => raise Fail " Parsing Error "
| SOME(s1) => (printExpr s1 ""; eval s1)
|
Problem 1: Code normalization
The first part of the assignment consists on normalizing Mini-ML
programs. You are going to work on two normalizations:
- Removing anonymous functions:
We would like to avoid expressions with anonymous functions.
We can change the code to add new let constructs that will
bind the functions to some name.
For example
let
val x = 4
in
(fn y => y+1) x
end
|
can be transformed to:
let
val x = 4
in
(let fun f (y) = (y+1) in f end) x
end
|
Note that Mini-ML supports two function expressions:
(fn x => e) and (fun f x => e). We will
also change fun expressions by adding a let expression like in the
example above. The advantage of fun expressions is that you already
have a name for the function, for anonymous functions you need to
give a fresh name to each function you add. For this purpose we
give you a function Given.newFun() (in file
include.sml) which returns a fresh new function name (f_i)
each time is called.
One small complication for this problem is that
since we have no pairs in Mini-ML, functions that need
several parameters are written in curried form. Ie.
anonymous functions are used to pass several
arguments. These anonymous functions are allowed. For instance:
let
fun f (y) = fn (z) => y + z
in
f 2 4
end
|
Should remain unchanged after the normalization.
You need to implement this normalization in a function
removeAnonymous: expr -> expr
|
Given an input AST, the result of removeAnonymous
is an AST that:
- Has no FUN_E expressions
- The only FN expressions in the AST are used for
functions with multiple arguments (in curried form).
- Pulling let constructs to the outer-most level of their scope:
We would like to have all let expressions at the top level.
Since Mini-ML doesn't have a top level environment, you
will produce nested let expressions, and place them at
the outer-most level of their scope. For example the
program on the left should be normalized to produce the program on
the right:
Input | Output |
let
fun g (x) =
let
val y = 3 * x
in
x + (let val w = 2 * y in w end)
end
in
(let fun f (y) = (g(y+1) + g(y-1)) in f end) 3
end
|
let
fun g (x) =
let
val y = 3 * x
in
let
val w = 2 * y
in
x + w
end
end
in
let
fun f (y) = (g(y+1) + g(y-1))
in
f 3
end
end
|
Notice, in the example above we didn't move the
declaration of y and w outside
function g: we don't lift let expressions outside
function declarations.
The main complication of this transformation is to make sure that
you replace some declaration names to avoid name conflicts. For
example:
Input | Output |
let
val x = 1
in
(let val x = 2 in x end) + x
end
|
let
val x = 1
in
let
val v_1 = 2
in
v_1 + x
end
end
|
You should use the functions Given.newVar() to create fresh
variable names, and Given.newFun() to create fresh function
names. You can assume that these fresh names are different
from any name present in the input programs.
Write your solution in the function
Given an input AST, the result of liftLet
is an AST that:
- All let declarations are group together
- All group of let declarations are either:
- At the top-level (like functions g
and f in the first example above)
- At the outer-most level of a function
declaration (like
y and w inside function g)
- Tip: to check your implementation, make sure your resulting
AST doesn't have any let expression inside another
kind of expression (such as binary operations).
- Tip: look at interpreter.sml you might
find functions that are useful and will save you some
time and work.
| File part1.sml
contains the functions you need to
implement. |
Problem 2: Lambda Lifting
Lambda lifting is another kind of normalization. The idea is to remove
the use of local function definitions. For example, consider the following SML
program:
let
fun f (x) =
let
fun g (z) = z + (x + 4)
in
g 3
end
in
f 5
end
|
We would like to extract g outside f, and simplify the program as follows:
fun g (v_1) (z) = z + (v_1 + 4)
fun f (x) = g x 3
f 5
|
To do lambda-lifting we need to:
- Assume that the code doesn't contain anonymous functions, and
that let expressions are lifted to the outer-most level of their scope.
- For each function declaration, we identify all free
variables. Free variables are variable names that are not
bounded by a declaration or an argument in the current scope. For
example, x is a free variable in the following expression:
We will replace each free variable with an additional argument to the
enclosing function, and pass that argument to every use of the
function.
This is why, in the example above, the function g now
receives a new formal argument v_1, and we passed x as an
actual argument in the body of function f.
- Replace every local function definition that has no free
variables with an identical global function. This is when we moved
g to the top level.
This step may require renaming functions to avoid name conflicts.
Since in Mini-ML we don't have a top level environment, the result of
the example above should look as follows:
let
fun g(v_1) => fn (z) => z + (v_1 + 4)
in
let
fun f (x) = g x 3
in
f 5
end
end
|
We ask you to implement the answer to this problem in several steps:
- freeVar: Implement a function
val freeVar: string * (string list) -> bool
|
Which takes the name of a variable, and a list of variable names
that are bounded. The result is true if the variable doesn't belong
to the set of bounded variables. Eg:
freeVar("x", ["y","z"]) = true
freeVar("x", ["x","z"]) = false
|
- canLiftVar: Write a function
val canLiftVar: expr * (string list) -> bool
|
Its arguments are a Mini-ML expression (e) and a list of bounded
variable names. It returns true if is any subexpression of e
can be lifted. This is true in two possible cases:
- There is a free variable in the expression e. For instance:
canLiftVar (*x*, ["y"]) = true
canLiftVar (*y*, ["y"]) = false
|
(Notice: we write *x* to simplify the presentation in this page,
you need to replace it with the full expression, in this case
ID_E("x"). We will use this same notation though
the rest of this document)
- There is a function declaration that contains free
variables with respect to the function arguments (regardless of
the set of bounded names). For instance,
canLiftVar (*let fun f x = x + y in f 3 end*, ["y"]) = true
canLiftVar (*let fun f x = x + y in f 3 end*, []) = true
canLiftVar (*let fun f x = x + y in f 3 end*, ["x","y"]) = true
|
In all these cases, y is a free variable inside the body of f,
and we would like to lift it (by adding an additional argument
in f). Essentially, this scenario applies for any
set of bounded names, what is relevant is that: a) a
function is declared, b) some
variable is used in the body of the function, and c)
that variable is not an argument of the function. If in the
examples above we add to f the argument y,
canLiftVar should return false:
canLiftVar (*let fun f x = x + y in f 3 end*, _) = true
canLiftVar (*let fun f x = fn y => x + y in f 3 end*, _ ) = false
|
- extendDecl: Write a function
val extendDecl: decl * string list -> string * decl
|
extendDecl receives a declaration and a list of new variable
names. The idea is to extend a function declaration with additional
arguments. Eg:
extendDecl (*val x = 5*,["v_1"]) = ("",*val x = 5*)
extendDecl (*val f = fn y => y + v_1*,["v_1"]) = ("f", *val f = fn v_1 => fn y => y + v_1*)
extendDecl (*fun f y = y + v_1*,["v_1"]) = ("f", *val f = fn v_1 => fn y => y + v_1*)
|
The first component of the result is the name of the function being extended (if any).
- extendBody: Write a function
val extendBody: expr * string * expr list -> expr
|
Which takes an expression, a function name, and a list of expression
arguments. It produces a new expression by passing the arguments
whenever the function name is used:
extendBody (*f 1*,"f",[*y*]) = *f y 1*
extendBody (*let fun f x = x in f 1 end*,"f",[*y*]) = *let fun f x = x in f 1 end*
|
Note in the second example the let declaration redefines the name of
f, thus f is not extended in this case. Make sure you deal correctly
with these cases.
- liftVars: Write a function
val liftVars: expr -> expr
|
That takes an expression, and recursively extends all function
definitions to avoid having any free variables. For example,
Input | Output |
let
fun f(x) = fn y =>
let
fun g(z) = (z + x)
in
(g 3 + y)
end
in
f 3 4
end
|
let
val f = fn x => fn y =>
let
val g = fn v_1 => fn z => (z + v_1)
in
(g x 3 + y)
end
in
f 3 4
end
|
Notice that we removed the fun f x declaration and
replace it by val f = fn x . This is a legal
change in this example because the function is
non-recursive. Mini-ML also allows this change for
recursive functions. This is because val declarations in
Mini-ML are like val rec declarations in SML.
You can see how this is implemented in function
eval_decl (file interpreter.sml).
In summary, liftVars should take a normalized AST
(according to what you did in part 1), and return a new
AST, where all variables inside function declarations are
bounded to a parameter for that function.
- liftGlobal: Finally, write a function
val liftGlobal: expr -> expr
|
Which takes the result of the last step and pulls all function
declarations to the outer-most level. For example,
liftGlobal applied on the previous result should return:
let
let
val g = fn v_1 => fn z => (z + v_1)
in
let
val f = fn x => fn y => (g x 3 + y)
in
f 3 4
end
end
|
As in the liftLet problem, this step may require renaming
functions to avoid name conflicts. The result of this
step is a new AST that contains all function
declarations at the outer-most level. Let expressions
may exist inside function declarations, but only to
define local primitive values, not functions.
| File part2.sml
contains the functions you need to
implement. |
Problem 3: Fully Lazy Lambda Lifting
An optimization to lambda lifting is to give full laziness. The idea is
to extract maximal free expressions, instead of only free variables.
This allows to reduce the amount of recalculation in a program.
Free expressions are program subexpressions that are
closed or only contain free variables.
For instance, in
let
fun f (x) =
let
fun g (z) = z + (x + 4)
in
g 3
end
in
f 5
end
|
x is the only free variable inside function g, and x + 4
is the largest expression that doesn't contain bounded variables (is a
maximal free expression). We would like to lift the free expression as
follows:
let
fun g v_1 (z) = z + v_1
in
let
fun f (x) = g (x + 4) 3
in
f 5
end
end
|
Instead of simply lifting the free variable x, we have added an
argument for the expression x + 4.
You will follow the same process as before to implement this part:
- freeExpr: Implement a function
val freeExpr: expr * (string list) -> bool
|
Which takes an expression, and a list of variable names
that are bounded. The result is true if the expression doesn't use
any bounded variables. Eg:
freeExpr(*x + 4*, ["y","z"]) = true
freeExpr(*x + 4*, ["x","z"]) = false
freeExpr(*let val x = 4 in x end*", ["x","z"]) = true
|
- canLiftExpr: Write a function
val canLiftExpr: expr * (string list) -> bool
|
like before, it returns true if:
- The expression contains a free subexpression that can be
lifted, or
- There is a function declaration that contains a free
subexpression (which is free with respect to the set of
arguments)
- liftExpr: Write a function
val liftExpr: expr -> expr
|
That takes an expression, and recursively extends all function
definitions to avoid having any free subexpressions.
Notice that you should be able to reuse the functions
extendDecl and extendBody from the
previous part:
val extendDecl: decl * string list -> string * decl
val extendBody: expr * string * expr list -> expr
|
The resulting AST is similar as in liftVars, except that
the extended function application may contain large expressions
(not only variables), and the function declarations are
simplified to contain smaller expressions.
Notice that liftGlobal can also be
applied on expressions returned from liftExpr.
| File part3.sml
contains the functions you need to
implement. |
Problem 4: Induction
- Using the technique shown in class, find a formula for
S3(n), the sum of the third powers of the integers from 1
through n. Prove by induction that your formula is correct.
- Use induction to show that there are at most
mh leaves in an m-ary tree of height h.
- Emily is learning proofs by induction and she has come
up with the following proof that a tree with n nodes has (n-1) edges.
Proof:
Base case: A tree with one node has zero edges because
the only edges in such a tree would form a self-loop.
Inductive hypothesis: Any tree with k nodes has (k-1) edges.
Inductive step: Starting with a tree T that has k nodes, form a tree
with (k+1) nodes by adding a new node and connecting it to a node
in tree T. This construction required adding one extra node and one
extra edge, so the theorem is true for this tree as well.
Conclusion: By induction, we can conclude that a tree with n nodes has
(n-1) edges.
- Is this is a valid proof by induction? If not, what is the problem with
the proof.
- Is the theorem correct? If so, give a valid proof by induction
of the theorem.
- Consider the following datatype.
datatype 'a tree = Empty | Node of 'a tree * 'a * 'a tree
|
Given the function rev defined below - intuitively, this creates
a mirror image of the input tree.
fun rev Empty = Empty
| rev (Node(t1,n,t2)) = Node(rev(t2),n,rev(t1))
|
Use structural induction to prove that rev(rev(t)) = t.
| Write up your solutions in a text
file part4.txt. |