Extend your compiler
We are going to be using just a subset of Bali for this assignment --- a different subset than that used for Part 2. The Part 3 subset includes functions and global variables. A single program can include multiple functions, but there is still a main function that is called automatically when a Bali program is run. Here's a sample program:
# Sample Part 3 program that does a factorial calculation : int main ( ): int n : n = readInt; loop while n >= 0; print factorial(n); n = readInt; endloop return 0; end int factorial (int i) : : if i < 2 then return 1; endif return i * factorial(i - 1); end
The specifications for the part of Bali being used for this assignment appear in Bali Specs for Part 3.
For this assignment, you must also include five to ten Bali-Part3 programs that we can use as input to your compiler. The idea here is (1) to encourage you to think about test-cases before you complete your coding and (2) to give you a chance to show the graders, "Look, my compiler may not work on all your test cases, but it does work on these." Each of your Bali programs should be clearly labeled to show what it's testing and the result expected when the program is run.
For this version of Bali, at each point in the Bali-code, there are at most two active namespaces: a global namespace and a local namespace. The global namespace holds global variable names, function names, and type names (for Part 3 the only types are int and boolean, but there will be more types for Part 4). The local namespace only exists within a function; it holds local variable names and parameter names. Each namespace corresponds in a straightforward way to a symbol table. To determine the meaning of a name (that is not a syntactic type), your compiler code should first check the local namespace and then, if the name hasn't been found, your code should check the global namespace.
To give you an idea of where we're headed, in the final version of Bali (for Part 4), there will be at most three active namespaces: local level, class level, and global level. Within a method, a name can be valid at any of the three levels (e.g., a name can be a local variable within a method, a name can be a field of the current class, or a name can be a global variable or a function). Within a function, a name is valid at just 2 levels (e.g., a name can be a local variable or a name can be a global variable).
The Stack is used to keep track of the function calls. Each time a function is called a new stack frame is created. The FBR (Frame Base Register) is used to indicate the current frame. Recall that the FBR is used in the SaM instructions PUSHOFF and STOREOFF. PUSHOFF offset pushes the value stored at FBR+offset onto the Stack. STOREOFF offset stores the value on top of the stack into location FBR+offset.
What is kept in a function's frame? Here's a picture of a stack frame. The stack grows toward the top. See the October 10 lecture for details.
local variables |
saved PC |
saved FBR |
parameters |
return value |
The local variables are the function's local variables. The saved PC indicates where in the code we are supposed to jump when the function is done. The saved FBR indicates the value the FBR should have when this frame is destroyed (i.e., the previous stack frame). The parameters are the function's parameters and the return value is the function's return value.
The caller (i.e., the code that calls the function) and the callee (i.e., the function itself) have joint responsibility for building the stack frame when the function is called and for clearing the stack from when the function is done. See the SaM Simulator page for details on the instructions used below.
Code Pattern for Caller | ||
---|---|---|
Example Bali code: func(exp1, exp2, exp3); | ||
ADDSP 1 | Push space for return value | Frame Creation |
code for exp1 |
Push arguments | |
LINK | Create new frame (update FBR, pushing old FBR onto stack) |
|
JSR func | Jump to callee, pushing return address onto stack | |
UNLINK | Restore the FBR (from stack) | Frame Clean-Up |
ADDSP –3 | Clear the arguments from stack |
Code Pattern for Callee | ||
---|---|---|
Example Bali code: |
||
ADDSP v | Reserve space for local variables on stack | Frame Creation |
code for statements | Function Execution | |
code for exp | ||
JUMP endfunc | ||
endfunc: STOREOFF –4 | Store the return value in stack frame | |
ADDSP –v | Clear local variables from stack | Frame Clean-Up |
JUMPIND | Return to caller (using return address stored in frame) |
The example shown in Code Pattern for Callee is a particularly simple example in which the only return-statement appears at the end of the function's code. Since a return-statement can occur anywhere in a function, you'll need to either make the clean-up code appear as part of the code you generate for a return statement or you'll need to include a "jump to clean-up code" instruction in the code you generate for a return-statement.
Note that the sam-code for a program must initially follow the Code Pattern for Caller in order to set up the stack frame for the main function. Of course, the main function itself must follow the Code Pattern for Callee.
You will need a symbol table for each function. The symbol table for a function holds, for each variable, the variable's location (i.e., the offset from the FBR) and the variable's type. Note that the word variable here is being used to denote both parameter variables and local variables. A function's symbol table can be discarded after the code generation for the function is complete.
Note that you need to know all the function names before you start generating code; this is so that it's possible to have a function A that contains a call function B, while function B contains a call function A. Information about functions can be stored in the global symbol-table. You should generate an appropriate error message if a Bali program contains a call to a nonexistent function.
For this assignment, we will test your compiler's response to errors in supplied Bali programs. There are three kinds of errors that can occur in Bali programs:
For example, a missing semicolon or an unmatched parenthesis is a syntax error; a type mismatch or an undeclared variable is a semantic error, and an attempt to divide by zero is a runtime error.
Note: You are not responsible for handling runtime errors. It's possible to show (CS 381) that there is no way to detect all division-by-0 errors (or other typical runtime errors) without actually running the program. If a runtime error occurs then we depend on SaM to detect it and provide an error message.
For the errors that you do handle, rather than printing an error message to System.out (or System.err), errors in supplied Bali programs can be handled most clearly and cleanly by using exceptions. We have supplied exception classes corresponding to the two kinds of errors that can occur during compilation:
BC.java, the main program that runs your compiler, is designed so that it catches BaliExceptions and then prints the exception's message.
Note that these classes have changed since Part 2: each class now resides in the bali package.
Note that according to the grammar, a function call and an assignment statement can both start with a set of tokens that look like an reference. There is no way to tell which one you are parsing until you get to the equal sign (=) or the semicolon (;).
The easiest way to handle this is to start parsing as if you are parsing reference. Once the reference is complete you can check for the equal sign (=). At this point, if you find an equal sign, you need to re-examine the AST you just built for the reference to see if it is appropriate for use as the target of an assignment statement. Your compiler should throw a BaliSemanticException if the reference is inappropriate as a target.
Good Java programming practice dictates that we make use of Java packages for a programming project of significant size. To this end, we have changed some of the files we provide so that they reside in the "bali" package. Java requires that the package statement at the beginning of a file must correspond to where the file is stored in the file system; in other words, you'll need to create a bali folder to hold any files that are part of the bali package.
We encourage you to organize your compile project into appropriate packages. You are welcome to use the bali package and you may want to consider creating subpackages. If you haven't used packages before, you may want to take a look at, for instance, the Java Package Tutorial.
These provided files should not be changed. The exception is BaliCompiler.java; it is expected that you will replace this file with your own version.
We recommend carefully planning how you plan to extend your compiler before you begin coding. If you wish to do so, you may submit a document explaining your plan and sign up for a meeting slot with a TA to go over your ideas. For this assignment, the Design Document is encouraged, but not required. Note that the Design Document has an earlier due-date than the compiler.
Write a Bali compiler that translates Bali code (satisfying the Bali Specs for Part 3) into valid sam-code. You may use any of the sam-code instructions; these are summarized on the SaM Simulator page.
/********************************** * Assignment #0: Example format * Date: 1/1/1111 * * Don Key: dk0 * Elizabeth (Liz) Ard: ea0 ***********************************
Submission Instructions for Part 3:
Warning: Do not spend time on the bonus work unless your compiler already performs as required for Part 3.
Detect and report multiple errors in Bali programs. We have provided the MultipleBaliException class for those groups who wish to try detecting multiple errors. Suggested usage instructions have been provided in this file. The basic idea is to catch and accumulate errors, continuing to parse after each error.
As mentioned in lecture, it's easier to handle multiple semantic errors. How does one resume parsing after a syntax error? Consider ignoring tokens until you reach a token that you "recognize". Partial bonus credit is available for those who handle multiple semantic errors, but not multiple syntax errors.