CS 5641 (Fall 2004) Project

Computer Science 5641
Compiler Design
Project Part 4 - Symbol Table and Type Checking (50 points)
Due Thursday, December 2, 2004

Introduction

In this part of the project you will build a symbol table to connect uses with declarations using the rules described below and then you will build a type checker for the resulting annotated AST. In doing this you should make use of the parser and AST you implemented in part 3 of the project.

Name Rules

For the purposes of building your symbol table, you should apply the following rules:

A name exists in a scope from where it is defined to the end of that scope.
Each program has a global scope where global variable declarations and function definitions are inserted.
Each function introduces a new scope that contains both the parameters of the function and any variables declared within the function body.
A new scope is introduced by each block surrounded by curly braces ({ ... }).
A name may be declared once in each scope.
A name may not appear more than once in any scope.
A use of a name corresponds to the most closely nested declaration of that name.
Every variable declared with a struct type has all of the fields listed in the structure.
The struct operator (.) is used to refer to a variable of that structured type (e.g., variable x) and a field of that structure (e.g., field f) as in x.f. Note that the field of a structure could be another (previously declared) structure.

Type Checking

Once you have resolved all of the uses in a language you should type check the resulting AST according to the following rules:

It is an error to use a function name other than for a function call and a declared variable name as a function name.
The number and types of arguments for a function call must match the number and types of function parameters.
The resulting type of a function call is the type given before the name of the function.
A return statement must produce an expression of the same type as the return type of the function.
Variables of char, int, and float result in an item of that type.
char, int, float and string literals returns result of that type.
The ! operation applies to int values and results in an int.
The = and != operations apply to any two items of the same simple type.
The <, >, <=, and >= operations can be applied to two values of char, int and float values and result in an int.
The *, /, + and - operations may be applied to any two int or float items and result in the same type.
The && and || operations may be applied to any two int values and result in an int value.
The type of the expression on the right hand side of a declaration or assignment (=) must be of the same type as the name on the left hand side.
It is an error to reuse the same field name within a single structured type.
The left hand argument of a . operator must be a variable of a structured type and the right hand argument must be the name of a field of that structured type. The resulting type is the type of the named field.
A << can be applied to any simple type value.
A >> can be applied to any non-function variable.
The condition of an if or while node must be of an int type.
COERCION -- Values of a char or int type may be promoted "up" one level (a char may be coerced to an int and an int to a float, but not a char to a float). For example, the condition of an if can be an char val that is promoted to an int. You should insert a unary coercion operation into the AST as needed.
A legal program must contain a definition of a function named "main" that has no parameters and a return type of int.

Execution Order

Symbol table checking should only occur if no parser errors are detected. The symbol table check should report any multiple use and undeclared variable errors with the names of the variables. Type checking should only occur if no parse or symbol table errors were detected. The type checker should report errors as appropriate based on the above rules.

Output

If errors are detected error messages should be printed. In no errors are detected an annotated version of the code should be printed out. During symbol table processing each function and variable declaration should be given a unique number (give each declaration the next number in the sequence starting with 1). When printing out the code each declared name and use should be followed by the number associated with than name. Names of fields within a structure should also be numbered and that number printed out when they are used. Your code should also show any coercions that occur. For example, if the input were:

int a = 1;
int b = a;
struct S { 
  double x;
  int y;
};
int c( int a , int b ) {
  S s1;
  float c;
  s1.x = a + b;
  c = 1;
  a = ( b * 2 );
}
int main( ) {
  c( 2 , 1 );
  return 0;
}

The output should look something like this:

int a(1) = 1;
int b(2) = a(1);
struct S(3) {
  double x(3.1);
  int y(3.2);
};
int c(4) ( int a(5) , int b(6) ) {
  S(3) s1(7);
  float c(8);
  ( s1(7) . x(1) ) = ( (float) (a(5) + b(5) ) );
  c(8) = ( (float) 1 );
  a(5) = ( b(6) * 2 );
}
int main(9) ( ) {
  c(4) ( 2 , 1 );
  return 0;
}

What To Turn In

Turn in documented versions of all of your code (including test code). Also document your test cases and show results from your code on each test file. You will likely need to construct many test files in order to fully exercise your code. You should also write a team report on this part of the project and in addition submit a short individual report from each member of the team.