This problem set will have two parts. The second part will be release in approximately one week. You will hand in both parts at once, as a single problem set.
You have longer to do this problem set than you did on previous problem sets. This is not an accident; this problem set is significantly more work. So get started early!
You may do this problem set alone or in pairs. You will submit the files you changed through CMS.
In this problem set you will explore two much more sophisticated forms of program analysis than you did in PS#4. The basic data structures used (namely, AST�s) will remain the same; however, the analysis will be much more sophisticated, and provide an opportunity to explore interesting algorithms and data structures.
We have provided you with a main loop for your analysis routines, much like the optimization main loop in PS#4.
In order to minimize the ambiguities in this problem set, we have provided a set of examples showing how your code should behave. These examples have actually been run using our solutions. If there is sufficient demand we will add additional examples over time.
The first task that we will consider in this problem set centers on the notion of a call graph. Recall that a graph is a set of vertices (sometimes called nodes) connected by edges. The graph that we will consider involves directed edges, which means that we consider the edge that goes from v1 to v2 to be different from the edge that goes from v2 to v1.
In a call graph, the vertices represent functions. There is an edge from (the vertex representing) function1 to (the vertex representing) function2 when the function1 calls function2. For example, consider the SML expression:
let
fun ford(x) = x-20
fun arthur(y) = ford(y)*10
fun marvin(z) = arthur(z)+2
in
marvin(24)
end
This expression has the following call graph:
Note that there is no edge from marvin to ford, because marvin does not
call ford (at least, not
directly).
Furthermore, the expression above only calls
one function directly, namely marvin. In contrast, the expression below also calls ford
directly.
let
fun ford(x) = x-20
fun arthur(y) = ford(y)*10
fun marvin(z) = arthur(z)+2
in
marvin(ford 44)
end
In order to construct a call graph, we need
to specify what it means to say that a function f calls a function g. Ideally, we would say that f calls g
if at runtime f will invoke the
procedure g. This, however,
is clearly uncomputable, as shown from the following example:
fun f(x) = if (ford(x) = 42) then g(x) else 6*9
In order to figure out if f is actually going to call g we need to know the value of x, which we only know at runtime. Furthermore,
determining the return value of some arbitrary function like ford
is well known to be uncomputable.
As a result, we will use a sound (but
conservative) compromise: when we say that f calls g,
what we will mean is that f
could potentially call g
directly. To put it another way, this means that we cannot prove that f will not directly call g at run time. This definition is sound, in that if f
actually calls g at run time, then under our definition we say that f calls g.
It is conservative, because it is possible for us to say that f calls g when
at run time f decides not in fact
to call g. The advantage
of this definition is that it is easy to compute: we simply test if the
identifier g appears anywhere inside the body of the definition of
f.
In order to make this problem considerably easier, you can make the
following assumptions about the input expression that you will optimize:
Alpha renaming: In your solutions, you can assume that alpha renaming has already occurred (i.e. that any two identifiers with the same name are in fact the same).
Limited input expressions: You can assume that your input expression is in the same
form as the two let
expression examples given above. More
precisely, the input expression is a let
expression containing fun declarations (no
val
declarations), and there are no other let or fn expressions anywhere within the input expression.
Simple syntax for mutual
recursion: To define mutually
recursive functions in SML it is necessary to use the keyword and
(not to be confused with andalso). However,
in order to simplify this problem set we will assume that functions defined
with fun in a let can be
mutually recursive.
Thus, for example, your code
should handle the following mutually recursive pair of functions:
let
fun take(L:int list):int list =
if L = nil then nil else hd(L)::skip(tl(L))
fun skip(L:int list):int list =
if L=nil then nil else take(tl(L));
in
take [1,2,3]
end
In order for this to work in
SML, the second fun (shown in bold) would need to
be replaced by and.
In this problem set, you will modify the included file callgraphs.sml. This is the only file you should modify.
The first call graph problem involves detecting
and eliminating dead code. More precisely, there can be functions that are
declared that are not used in the body. As a simple example, consider:
let
fun ford(x:int):int = x-20
fun arthur(y:int):int = (y-20)*10
fun marvin(z:int):int = arthur(z)+2
in
marvin(24)
end
In this expression, the function ford is never called. Obviously, it would be desirable to
detect such �dead� functions (and perhaps rewrite the code to remove them).
In this problem you are to detect and remove
dead functions. The two procedures that you need to write are listDeadFunctions and removeDeadFunctions. Their signatures are provided in callgraphs.sml, and
the analyzer will call them and report on their results. Note that the
input will only be expressions in the restricted form that we use for call
graph analysis. Be sure that your code works correctly for recursive functions.
Problem CG-1.1: Write a function listDeadFunctions which
takes an expression and returns a list of strings, which are the names of the
dead functions.
Problem CG-1.2: Write a function removeDeadFunctions which
takes an expression and returns a new expression, which has removed the dead
functions.
Recursive functions, and especially
mutually recursive functions, cause a fair amount of complexity. For example,
it is not possible to inline such functions. In this problem, you will detect
recursive functions. Note that in this definition, we consider not only the
functions that f calls directly, but also that it calls indirectly. For
example, if f calls g, which in turn calls f, then f is recursive.
To handle this, we need a
definition of what it means for a function to be recursive. In classic CS312
fashion, this definition (of course!) will be recursive.
More precisely, we define a
function f to be recursive if f is in callable(f), where:
Problem CG-2: Write a function listRecursiveFunctions which
takes an expression and returns a list of strings. Each element of the output
will a function which calls itself, either directly or indirectly. Note that
the input will only be expressions in the restricted form that we use for call
graph analysis.
Functions which cause side effects are even more difficult to optimize than recursive functions. Not only are they impossible to inline, but they are extremely difficult to reason about. For this reason, we will attempt to detect such functions.
As usual, we have the problem that it is uncomputable to tell whether or not a function will actually cause a side effect at run time. However, there is a simple syntactic test we can do, which is sound but conservative. If a function ever uses the assignment operator :=, then we will assume that it causes side effects.
In this problem, your task is to detect all functions that cause side effects, either directly (by using the assignment operator) or indirectly, by calling some other function which causes side effects.
Problem CG-3: Write a function listImperativeFunctions which takes an expression and returns a list of the names of functions which could cause side effects. Note that the input will only be expressions in the restricted form that we use for call graph analysis.