IC is a simple language for which we will be writing compilers in CS413. This language definition will be updated periodically as the need for clarifications or corrections arises. If you see ambiguity in this specification, you should contact the course staff to resolve ambiguities.
A program consists of a collection of one or more modules. Each module
consists of two files: an interface file and an implementation file. The
implementation file provides the actual code for the module, and the interface
file declares the parts of the module which are externally visible, i.e. the
variables and functions from that module which are accessible from other
modules. A module must define everything that its interface declares, but it
may also define additional variables and functions that are not accessible
from outside the module. The filename extension for implementation files is ii
and for interface files ic
(a module mod
will have
an interface file mod.ii
, and an implementation mod.ic
).
<
'
and '>
'. Keywords and terminal characters are written in bold
fonts (e.g. while, <=
). Terminals which represent
tokens are written in bold italic type (e.g.,
string
). Brackets [ ] indicate an optional item in the production
and parantheses ( ) are used for grouping. Quoted brackets and quoted
parantheses '[', ']', '(', ')' are terminal characters. Finally, a group of
items followed by an asterisk (item)*
is the Kleene closure, i.e.
a sequence of zero or more items.
The non-terminal
<module> ::= [<uses>] <definition>*
<uses> ::= uses <component> (, component)*
<component> ::= id . id
<uses>
represents a declaration that the code of
this module will use several external components. Each component is of the form mod.i,
denoting the fact the the identifier i from module mod may be
used in the current module under the name i.
A module contains one or more definitions of variables or functions. All top-level definitions are visible throughout the module, regardless of where they are defined. Two definitions may not use the same name; nor may they conflict with any identifier defined by a use declaration.
A function definition may define two kinds of functions: functions that return a value, and functions that do not. Functions of the former sort are indicated by writing the type of their return value after the closing parenthesis of the formal arguments. In both case, the implementation of the function is described by a block containing a list of one or more statements between enclosing parantheses '{' and '}'. The list of formal arguments of a function specifies the name and the type of each argument; a function may be called only if the actual arguments to the function have a type that is the same as the type of the formal arguments.
<definition> ::= <type> id
| function id : '(' [<formals>] ')' [-> <type>] <block>
<formals> ::= <formal> (, <formal> )*
<formal> ::= <type> id
<block> ::= { (<statement>)* }
<interface> ::= <declaration>*
<declaration> ::= <type> id
| function id : <signature>
<signature> ::= '(' [<type_list>] ')' [-> <type>]
<type_list> ::= <type> [id] (, <type> [id])*
There are three primitive types: integer, boolean, and string. Also, for any type
T,
the type
array[
T]
represents
an array of that type. Using this recursive type definition, array[array[int]]
is an array of an array of integer, i.e. an integer matrix.
<type> ::= int | bool
| string | array '[' <type> ']'
The possible statement forms are shown below.
<statement> ::=
| <type> id [= <expression>] ;
| <function_call> ;
| <lvalue_expression> = <expression> ;
| if '(' <expression> ')' <statement> [else <statement>]
| while '(' <expression> ')' <statement>
| break ;
| continue ;
| return ;
| return expr ;
| <block>
A new variable may be introduced by a statement, as in the first production. As in Java, this variable remains in scope in subsequent statements until the closing brace of the tightest enclosing pair of braces that surround this statement. The new variable may optionally be initialized with an assignment to an expression. If not, the value of the variable is undefined until its first assignment.
An if statement evaluates the initial expression and executes
the following statement if the expression evaluates to true. The expression
must have type bool. If the expression evaluates to false, the
else
clause is evaluated if present.
A while statement operates as in Java. The expression
tested must have boolean type. A break
or a continue
statement must belong to an enclosing while
statement. The break
statement stops the execution of the innermost while loop; the execution of the
program continues with the next statement after the loop body. The continue
statement stops the execution of the current iteration of the innermost while
loop; the execution of the program continues by testing the loop condition.
A return statement may either return a value or not. A return statement in a function with no return type may not return a value; a return statement in a function with a return type must return a value. A return statement has no value itself.
Finally a block of statements is a statement itself, which means that blocks
of statements can be nested.
Expressions may be of one of the following forms:
<expression> ::= <simple_expression>
| <binary_expression>
| <unary_expression>
| <constructor_expression>
| <constant_expression>
| '(' <expression> ')'
<simple_expression> ::= <lvalue_expression>
| <function_call>
<lvalue_expression> ::= id
| <simple_expression> '[' <expression> ']'
<binary_expression> ::= <expression> <binary_op> <expression>
<unary_expression> ::= <unary_op> <expression>
<constructor_expression> ::= new <type> '[' <expr> ']'
<constant_expression> ::= string
| integer
| true
| false
There are three kinds of constants : integer constants, string constants,
and boolean constants. The forms of integers and strings are described
below in Lexical considerations. The non-terminal <simple_expression>
represents
an expression with high precedence -- an expression that can be used as
the left-hand-side of an array index; these expressions include variable
names, array index expressions, and function call expressions. The non-terminal <lvalue_expression>
represents expressions that can occur in the left hand side of an
assignment; these can be either variable names or array index expressions.
Operators may be either arithmetic operators, comparison operators or conditional operators. Operators have the same precedence and associativity that they do in Java, and unless specified, the same meaning.
<binary_op> ::= <arithmetic_op>
| <comparison_op> | <conditional_op>
<arithmetic_op> ::= + | - |
*
| / | %
<comparison_op> ::= < |
> | <= | >= | == |
!=
<conditional_op> ::= \/ | /\
<unary_op> ::=
- | lengthof
Arithmetic operators include addition, subtraction, multiplication, division, and the modulo operator. All of these operate only on integers, with the exception of + which is also used to concatenate two strings. A divisor or modulus of zero is checked, and results in the program terminating.
Comparison operators test the relation between their operands: strictly less than, strictly greater than, less than, greater than; these operators may be used to compare integer numbers or to compare strings according to their lexicographic order. The other two comparison operators are the equality test operators: equals and not equals. These operators can be applied to any variables of the same type. They work as follows: for integers and boolean they test the equality of values; for strings (or arrays) they test for the equality of the string (or array) object.
The conditional operators are the short-circuit and operator '/\
' and the
short-circuit or operator '\/
'; they have the same semantics as the equivalent &&
and ||
operators in Java. Finally, unary operators include sign
change for integers and computing the length of an array.
Brackets may be used to index both arrays and strings, extracting either the component of the array with that index, or the ASCII code of the character with that index, respectively. Both arrays and strings are zero-indexed, so the largest index that may be used is one less than the length of the array or string. Out-of-bounds errors are checked for and terminate the program with an error if encountered.
A new array object is created by an array constructor expression. The keyword
new
is
followed by element type and the number of elements in the array. For example, the expression
new int[5] creates an array[int] containing 5 elements.
A function call looks the same as in Java. The optional list of actual expressions is separated by commas. The types of the expressions must match the declared types of the corresponding formal arguments in the interface of the module that the identifier comes from.
<function_call> ::= id '(' [<actuals>] ')'
<actuals> ::= <expr> (, <expr>)*
Since the scope of each variable temp is limited to its respective block of statements, the variables may have the same name.{ int tmp = x1; x1 = y1; y1 = tmp; } { int tmp = x2; x2 = y2; y2 = tmp; }
Integer literals (integer
) are either 0 or a digit in
the range 1..9, followed by a sequence of digits in the range 0..9. Negative
integers are expressed as a unary negation operator followed by a positive
integer constant. Integers in 32-bit values.
The largest integer literal is 2147483648 (231). All integer
literals from 0 to 2147483647 may appear anywhere an integer literal may
appear, but the literal 2147483648 may appear only as the operand of
the unary negation operator -.
String literals (string
) are largely as defined for Tiger (Appel,
p. 526). The escape sequence \^
c, where c
is an arbitrary character, stands for a control character. To clarify this
definition, c may be any character whose ASCII code is in the range
64..126; the escape sequence stands for the control character whose ASCII code
is in the range 0..31 and is equal to the ASCII code of c, modulo 32.
Boolean literals must be either the keyword true or the keyword false.
Identifiers must begin with an alphabetic character. Following the initial
alphabetic character may be any sequence of alphabetic characters, numeric
characters, or the underscore character ( _ ). Uppercase and lowercase
alphabetic characters are both considered alphabetic and are distinguished,
so x
and X
are different identifiers.
Keywords in the language (int,
string, bool,
array,
uses, function
, if,
else,
while, break
, continue
, return,
new,
lengthof,
true
and false) may not be used as identifiers.
White spaces consists of a sequence of one or more space, tab, or newline characters. White spaces may appear between any tokens. Keywords and identifiers must be separated by white space or a token that is neither a keyword or an identifier. For instance, elsex will represent a single identifier, not the keyword else followed by the identifier x.
function main : (array[string] args) -> intThere must be a function named main in exactly one of the modules in the program and it must have exactly this signature. When the program is run, this function will be evaluated, passing any command-line arguments to the argument variables args. The expression (lengthof args) can be used to determine the number of arguments. The return status of the program is defined by the return value of main.
function print : (string s) // output the string s to the standard output device function printi : (int i) // output the integer i as a formatted integer to standard // output. Equivalent to print(itos(i)) function printc : (int c) // output the character whose ASCII value is c to standard output function readln : () -> string // read a line from the standard input device function readc : () -> int // read a single character from standard input function eof : () -> bool // return whether the end of file has been reached on standard input
function itos : (int i) -> string // Produce a formatted version of i. function stoi : (string s, int err) -> int // Parse s as a formatted integer. // Return err if the string could not be parsed. function itoc : (int i) -> string // Produce a string containing the single character whose ASCII code is i. function atos : (array[int] a) -> string // Produce a string containing the characters whose ASCII codes are in a, // in proper sequence function stoa : (string s)-> array[int] // produce an array containing the ASCII codes of all the characters in s, // in proper sequence
quicksort.ii:
function quicksort : (array[int] a, int l, int h) /* Put the array elements a[l]...a[h] into ascending order. Requires that 0 <= l <= h <= lengthof a */
quicksort.ic:
function quicksort : (array[int] a, int low, int high) { if (low < high) { int mid = partition(a, low, high); quicksort(a, low, mid); quicksort(a, mid + 1, high); } } function partition : (array[int] a, int low, int high) -> int { /* Reorder the elements in a into two contiguous groups. If ret is the return value, then the first group is the elements in low...ret, and the second group is the elements in ret+1...high. Each element in the second group will be at least as large as every element in the first group. */ int x = a[low]; int i = low; int j = high; while (true) { while (a[j] > x) { j = j - 1; } while (a[i] < x) { i = i + 1; } if (i < j) { int tmp = a[i]; a[i] = a[j]; a[j] = tmp; } else { return j; } } return 0; }