Computer Science 5641
Compiler Design
Project Part 4 - Symbol Table and Type Checking (50 points)
Due Thursday, December 2, 2004
Introduction
In this part of the project you will build a symbol table to connect uses
with declarations using the rules described below and then you will build
a type checker for the resulting annotated AST.
In doing this you should make use of the parser and AST you implemented
in part 3 of the project.
Name Rules
For the purposes of building your symbol table, you should apply the following
rules:
- A name exists in a scope from where it is defined to the end of that
scope.
- Each program has a global scope where global variable declarations and
function definitions are inserted.
- Each function introduces a new scope that contains both the parameters
of the function and any variables declared within the function body.
- A new scope is introduced by each block surrounded by curly braces
({ ... }).
- A name may be declared once in each scope.
- A name may not appear more than once in any scope.
- A use of a name corresponds to the most closely nested declaration of
that name.
- Every variable declared with a struct type has all of the fields listed
in the structure.
- The struct operator (.) is used to refer to a variable of that structured type (e.g., variable x) and a field of that structure (e.g., field f) as in x.f. Note that the field of a structure could be another (previously declared) structure.
Type Checking
Once you have resolved all of the uses in a language you should type check
the resulting AST according to the following rules:
- It is an error to use a function name other than for a function call
and a declared variable name as a function name.
- The number and types of arguments for a function call must match
the number and types of function parameters.
- The resulting type of a function call is the type given before the
name of the function.
- A return statement must produce an expression of the same type as
the return type of the function.
- Variables of char, int, and float result in an item of that type.
- char, int, float and string literals returns result of that type.
- The ! operation applies to int values and results in an int.
- The = and != operations apply to any two items of the same simple type.
- The <, >, <=, and >= operations can be applied to two values
of char, int and float values and result in an int.
- The *, /, + and - operations may be applied to any two int or float
items and result in the same type.
- The && and || operations may be applied to any two int values and
result in an int value.
- The type of the expression on the right hand side of a declaration or
assignment (=) must be of the same type as the name on the left hand
side.
- It is an error to reuse the same field name within a single structured type.
- The left hand argument of a . operator must be a variable of a structured type and the right hand argument must be the name of a field of that structured type. The resulting type is the type of the named field.
- A << can be applied to any simple type value.
- A >> can be applied to any non-function variable.
- The condition of an if or while node must be of an int type.
- COERCION -- Values of a char or int type may be promoted "up" one
level (a char may be coerced to an int and an int to a float, but not
a char to a float). For example, the condition of an if can be an
char val that is promoted to an int. You should insert a unary coercion
operation into the AST as needed.
- A legal program must contain a definition of a function named "main"
that has no parameters and a return type of int.
Execution Order
Symbol table checking should only occur if no parser errors are detected.
The symbol table check should report any multiple use and undeclared variable
errors with the names of the variables.
Type checking should only occur if no parse or symbol table errors were
detected.
The type checker should report errors as appropriate based on the above
rules.
Output
If errors are detected error messages should be printed.
In no errors are detected an annotated version of the code should be
printed out.
During symbol table processing each function and variable declaration should
be given a unique number (give each declaration the next number in the
sequence starting with 1).
When printing out the code each declared name and use should be followed
by the number associated with than name.
Names of fields within a structure should also be numbered and that number
printed out when they are used.
Your code should also show any coercions that occur.
For example, if the input were:
int a = 1;
int b = a;
struct S {
double x;
int y;
};
int c( int a , int b ) {
S s1;
float c;
s1.x = a + b;
c = 1;
a = ( b * 2 );
}
int main( ) {
c( 2 , 1 );
return 0;
}
The output should look something like this:
int a(1) = 1;
int b(2) = a(1);
struct S(3) {
double x(3.1);
int y(3.2);
};
int c(4) ( int a(5) , int b(6) ) {
S(3) s1(7);
float c(8);
( s1(7) . x(1) ) = ( (float) (a(5) + b(5) ) );
c(8) = ( (float) 1 );
a(5) = ( b(6) * 2 );
}
int main(9) ( ) {
c(4) ( 2 , 1 );
return 0;
}
What To Turn In
Turn in documented versions of all of your code (including test code).
Also document your test cases and show results from your code on each
test file.
You will likely need to construct many test files in order to fully
exercise your code.
You should also write a team report on this part of the project and in
addition submit a short individual report from each member of the team.