CS 214: Programming Languages
Spring 2009

Home|Syllabus|Schedule
<<|>>|ANTLR API|CITkit API|PolyD API|EasyMock API

Adding Integers
Interpreters, Iteration 5

User Story

User Story #5: Interpreter evaluates a two-value sum.

A Hobbes program can be the addition of two values (any combination of integers and strings).

Examples: 2 + 3 and "hi " + "there". These evaluate to 5 and hi there, respectively.

For this particular iteration, you're only going to worry about adding two integers together.

Exploring ASTs and TIRs

Now the structure and workings of ASTs and TIRs become more important. A single integer or single string can be represented as one object. But a sum has to be represented as three objects (at least): the two operands and the operator.

Storing these three objects should reflect the structure of the original program. The structure of the program tells us how to interpret the code, so presumably your interpreters should know and use the structure.

The front end of an interpreter (or compiler) is the portion of the interpreter that reads the input (a stream of characters) and turns it into a useful data structure. The back end (the part you're writing) does the actual interpreting using the useful data structure.

And the most useful data structure to represent a program: a tree.

Abstract Syntax Trees

An abstract syntax tree (AST) is a tree data structure representing the syntax (i.e., original structure) of the program. ANTLR, a library used in the front end, will generate its own version of an AST from the parser (one of the phases of the front end).

For ANTLR, an AST consists of tokens (i.e., individual elements of a program like a single integer or a single string or a single operator) organized into general tree with an arbitrary number of children.

A single integer would be just a single token wrapped in a single-node tree. You've seen this in testing the interpretation of an integer. Similarly, you've seen how a string is wrapped up the same way. These are hardly trees, of course, because these are hardly programs! However, take two single integers and add them together, and you end up with a more interesting tree.

The expression 5 + 11 is parsed as this AST:

5 + 11

ANTLR's ASTs come with a variety of methods to help navigate through an AST: getChild(int), addChild(Tree), getParent(). You've already used the getType() and getText() methods.

If you wanted to build this tree yourself, here's some code to do that:

Tree root = new CommonTree(new CommonToken(HobbesParser.OPERATOR, "+"));
root.addChild(new CommonTree(new CommonToken(HobbesParser.INTEGER, "5")));
root.addChild(new CommonTree(new CommonToken(HobbesParser.INTEGER, "11")));

Tree Intermediate Representations

ASTs are a procedural way of processing information. You have methods to navigate through the structure of an AST, but the AST itself doesn't really do much. ASTs often only indicate the structure of a program. Sometimes you want to annotate the tree in some ways to provide more information to your interpreter.

Many interpreters and compilers use a tree intermediate representation (TIR). It's a little more powerful than an AST, and sits between the front end and the ultimate result (hence calling this data structure an "intermediate representation").

ANTLR also provides a tree parser which will parse its own ASTs. Some people use a tree parser to do the actual interpretation or compilation or other static analysis. I prefer to use a tree parser to build a TIR for me. The TIR hierarchy of classes is something I put together, and it's provided to you in the CITkit library.

Instead of providing generic building and navigation methods, each TIR class has its own particular structure and methods. So while the picture of 5 + 11 above is not a bad approximation of its TIR, the code to build the equivalent TIR is more telling:

OperatorETIR operator = new OperatorETIR(
    new IntegerETIR(5),
    OperatorETIR.Operator.ADD,
    new IntegerETIR(11));

These objects know exactly what kind of a TIR they are, so there's no extra "token type" to carry around. Also, data is carried around in its native form (e.g., integers as ints, not Strings).

Implementing

Let's assume for this iteration that the operands to an addition are always integers. The user story itself says otherwise, but stories are often broken down into smaller implementation steps.

The user story itself tells us to assume that there's only two operands to and addition, so there's no need (yet) to worry about things like 5 + 11 + 4. Similarly, you don't have to worry about multiplication or subtraction or division or modulus or exponentiation or etc...

Procedural

Reuse createIntegerAST(String). Also, String.valueOf(int) turns an int into a String. Be sure to try at least one negative number and zero.

Write a createAdditionAST(int, int) method in ProceduralHobbesInterpreterTest to generate an AST for an addition. Write a new test method shouldInterpretAddition(), and make three assertions with three different trees. Compile, and red bar.

The problem should be that the switch statement in your interpret(Tree) method doesn't handle an OPERATOR.

Add another case to handle OPERATORs. Have it throw an exception or maybe return null. Compile, and red bar.

Equipped with this iteration's assumptions, you can assume that an OPERATOR has two children, and they're both INTEGERs. So just grab the two children (tree.getChild(0) and tree.getChild(1)), turn their text (tree.getChild(0).getText()) into ints. Integer.decode(String) does the best job turning Strings into ints.

Write working code for the OPERATOR case. Compile, and green bar.

Write a few new Hobbes scripts, and try your driver on them. (The object-oriented interpreter will have problems with them, of course.)

Object Oriented

This is going to require a bit more work. So far, all your interpret(ExpressionTIR) does is turn the expression into a String with toString(). That won't work well for this iteration, at least not for an addition.

Write a new shouldInterpretAddition() test method in OOHobbesInterpreterTest. Make three assertions about three different additions. Compile, and red bar.

You should get an error about the strings: "16" is not the same as "5+11".

The parameter of interpret(ExpressionTIR) is originally of little help because it only thinks of itself as a generic expression. But this is an inheritance game. ExpressionTIR is really an interface, so while it's perfectly fine and good to declare variables and parameters of this type, the actual object is an IntegerTIR, a StringETIR, or an OperatorETIR.

You could ask it: (tir instanceof IntegerTIR). This is almost always followed by a cast: IntegerETIR i = (IntegerETIR) tir;. Whenever you use Java's instanceof operator or an object cast, you're not doing object-oriented programming. So let's avoid this as much as possible.

The object-oriented solution you'll use is a pretty significant overkill, but it'll set the OO framework for this interpreter until you're finished with it.

You're going to use what's known as the visitor pattern. If you had the ability to change the IntegerETIR and OperatorETIR classes, you could add in interpret() methods to those classes to do the interpreting. However, you do not have the ability to change these classes. Even if you did, there are many different interpreters (and compilers and static analyzers) that you could implement, and would you really want to implement methods in the class hierarchy for each one of them? (Hint: no!)

The visitor pattern is a design pattern that abstracts an algorithm over a class hierarchy. The classes in the hierarchy (like IntegerETIR and OperatorETIR) accept a visitor and call the appropriate method for their type. Your algorithm-implementing class only has to implement a visitor interface and the methods it requires.

This is most easily seen from an example, and you're ready to make the example yourself.

First, do yourself a favor, and change a code template in Eclipse. The example on that page explains what you should change and what to change it to.

To implement an algorithm with the visitor pattern, your class must implement a visitor interface. You're interested in the ExpressionTIRVisitor<T>.

Eclipse quick fix: click on the warning symbol in the editor's gutter. Or, click on the error, and press Ctrl-1. Your options change depending on the error; for this error, choose to implement the missing methods.

Change OOHobbesInterpreter to implement ExpressionTIRVisitor<String>. Take care of the 18 errors this produces with a quick fix on the class, implementing the missing methods.

You should now have eighteen more methods defined in the class. You'll only need three of them for this iteration (and for the near future).

Compile, and get the same red bar.

Turning OOHobbesInterpreter into a Hobbes visitor is only part of the solution. interpret(ExpressionTIR) is still handling the interpretation. You need that method to use its own object as the visitor. This will look a little odd:

public String interpret(ExpressionTIR expression) {
  return expression.accept(this);
}

accept(ExpressionTIRVisitor<T>) is a method specified by the visitor pattern. It's job is to take the visitor and use one of its methods to process the expression. All of that is implemented for you in the TIR hierarchy. What you've provided in OOHobbesInterpreter is the visitor itself. When expression is an IntegerETIR, then visitIntegerETIR(IntegerETIR) will be called.

Change interpret(ExpressionTIR). Compile, and red bar.

The red bar should now include failures for IntegerETIR and StringETIR. But the code that made them work was nothing more than calling toString() on the expressions. It's just that with the visitor, where you call toString() has changed.

Change visitIntegerETIR(IntegerETIR) and visitStringETIR(StringETIR) to compute correctly. Compile, and red bar.

You should have just the one error for an OperatorETIR assertion.

Care to guess which method you have to change in order to handle it?

Try a little trick.

Comment out two of your OperatorETIR assertions, and hard code the result of the remaining assertion in the proper visit method of OOHobbesInterpreter. Compile, and green bar.

The green bar is a cheat, of course, because you've hard coded the answer to just the one assertion. But at least you've confirmed where you have to change the code.

Uncomment the assertions, compile, and red bar.

So now the question is how to process the OperatorETIR. Suppose I call this value operator. If I want the first operand (the one on the left in the TIR), I can use operator.getLeft(). Unfortunately, that method returns an ExpressionTIR. Again, because of the inheritance hierarchy, an IntegerTIR is an ExpressionTIR, so creating the TIR isn't a problem. The problem is that it's harder for you to get the integer value out!

The answer for this iteration is a cast.

IntegerETIR left = (IntegerETIR) operator.getLeft();

Then you can call left.getValue() to get the actual int value out for an addition.

You should be puzzled and confused (and angry?) by the use of a cast here since I summarily dismissed them a few paragraphs ago. Using a cast means that you're not doing object-oriented programming. You're going to cheat on this iteration to keep things relatively simple. Soon (i.e., by the time you finish the current user story), you'll clean this up.

Don't forget String.valueOf(int).

Write good code for the visit method you're working on. Compile, and green bar.

Try out both drivers on some addition programs.