![]() |
CS 382: Special Topics - Compilers Spring 2005 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hobbes |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
What is Hobbes?Andrew Appel, in his compiler textbooks, proposes a relatively simple language named "Tiger" that can be easily extended with more advanced features. However, we are not using one of his books, there are more than a few obnoxious parsing issues with Tiger, and there are just too many things being called "Tiger" these days. Consquently, I've modified Appel's Tiger, and I've called the resulting language "Hobbes". Support FilesThere are a couple support files for the Hobbes AST that serves as our intermediate representation. These files were written with Java 2 SDK 5.0, but they should be backwards compatible if you're using an older SDK; let me know if they cause a problem with older SDKs.
NotationThe tokens in this document are abbreviated (e.g., id for "identifier"); if any are unclear, let me know. Different names may be used for different semantic roles, not syntactical difference; for example, syntactically id and typeId are exactly the same, but semantically the former specifies any identifier while the latter is specifically for a type identifier (which can be determined only during the semantic analysis). Non-terminals in the grammar are denoted in an emphasized font (e.g., id, type). Words (including punctuation!) in a monospaced font are keywords or operators. The productions also use the regular-expression star (i.e., *) and Klene star (i.e., +); the meanings are the same here. Parentheses are used to group tokens. The empty string is represented by epsilon. Lexical IssuesAn identifier is a sequence of letters, digits,
and underscores, starting with a letter. Case matters (e.g.,
Comments can appear between any two tokens.
Comments start with DeclarationsVariables, types, and functions must all be declared.
Data TypesHobbes has four types of data: integers, strings, arrays, and records. Hobbes also has type aliases.
The integer type is named A record type is specified by a list of the fields of the record. The two identifiers in a field declaration are the field name and the field's type (which must be an identifier); the field's name comes first. Different records can use the same field names. The length of an array is not part of its type. The length is determined at runtime. Each declaration of an array or record type is unique and incompatible with all other type declarations even if the component types are simliar. Data types can be mutually recursive:
type intlist = { head: int, tail: intlist }
type tree = { key: int, children: treelist }
type treelist = { head: tree, tail: treelist }
New type expressions create new data types (e.g.,
type a = array of int; type b = a; type c = b; Types Variables
Every variable must be declared before it is used; every variable must be initialized (the exp in these productions). However, not every variable must be typechecked. Since Functions
The first production is for a producedure (which does not return a value); the second production is for a function (which, as you might guess, does return a value). All arguments are passed by value. Function may be recursive:
function treeLeaves(t: tree) : int =
if t = nil then 1
else treelistLeaves(t.children)
endif
function treelistLeaves(L: treelist) : int =
if L = nil then 0
else treeLeaves(L.head) + treelistLeaves(L.tail)
endif
Scope RulesA The scope of parameters are the body of the function. Nested scopes are permitted. Inner scopes can access the variables in outer scopes. Inner scopes can shadow variables in outer scopes. Hobbes uses two namespaces, one for variables and functions, the other for types. Thus it is possible to use the same identifier in the same scope as both a type and a variable name. L-ValuesVariables, procedure parameters, fields of records, and elements of arrays are all l-values (i.e., locations whose value can be read or assigned).
The meanings of these productions are fairly straightforward; the exp of the array-element access must be an integer. ExpressionsAn l-value, when used as an expression, evaluates to the contents of its location. Some expressions do not evaluate to a value: procedure calls, assignment, if-then (lacking an else), while, break, and sometimes if-then-else. The value A sequence of expressions can be specified with parentheses:
The expressions are evaluated in the order listed, and the value of this sequence is the value of the very last expression (if any). The token sequence An integer constant is a sequence of decimal digits. Unary minus can be used to negate an integer constant. A string literal is a double-quote delimited sequence of zero or
more printable characters, whitespace, and escape characters.
Hobbes recognizes these escape characters (with their standard
meanings): A function is invoked with arguments to fill the parameters. Arguments are matched to parameters in the order listed, left-to-right. Arithmetic is done with the Comparisons are done with Boolean operators Precedence (from highest to lowest):
The arithemtic and boolean operators are left associative. The comparison operators do not associate. A record is created with the
Arrays too:
The first expression is of type integer and is the length of the array; the second expression is the initial value for each element of the array. Arrays are indexed 0 up to (but not including) the length of the array. In case of both arrays and records, typeId must be a valid array or record type, respectively. When records or arrays (not their contents) are involved in an assignment, the pointer to the record or array is assigned to the destination. This applies also to parameter passing. Records and arrays have inifinite extent---their lifetime extends beyond the code that created them. An assignment statement has this form: lvalue
The if-then-else statement comes in two forms:
Semantically the statement works as you would expect. (Remember that 0 is "false"; non-zero is "true".) The then and else expressions are evaluated only if necessary. The second form of the statement produces "no value". A while loop looks like this:
As long as the first expression evaluated to a non-zero value, the body is executed. A for loop looks like this:
The id variable is implicitly declared (as an integer variable) for this statement; its scope extends only to this loop. The upper and lower bounds are evaluated exactly once, prior to entering the body of the loop. The body is never executed if the upper bound is less than the lower bound. The break statement, A let expression declares variables for a sequence of expressions:
There are zero or more expressions in the expression sequence, separated by semi-colons. There will be at least one declaration (otherwise, why use the let expression?). The result of the let expression (if any) is the value of the last expression in the sequence. Standard Library
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|