CS 214: Programming Languages
Spring 2009

Home|Syllabus|Schedule
<<|>>|ANTLR API|ANTLR Testing API|CITkit API

Getting Started
FrontEnd, Iteration 1

User Story

Changes to your interpreter should take place based on user stories from your client. Your client wants an alternative interpreter: the look of Scheme with the Hobbes interpreter. Let's call this new language "Schobbes".

User Story #2: Arithmetic expressions in Schobbes

Arithmetic expressions (binary arithmetic only) are written prefix with parentheses.

Example: (+ 2 3) and (+ 5 (* 9 18)).

You're not going to get all of this done in this iteration. In fact, all you're going to worry about it recognizing the number 5 as a valid Schobbes program!

Acceptance Test and a Driver

Create a new folder acceptance/ciat/schobbes. Create a CIAT test in this folder for a program consisting only of the integer 5.

For simple data like integers and strings, these acceptance tests will be the same as the ones for Hobbes. So take an acceptance test that you wrote for Hobbes and tweak it for Schobbes.

You need to run the Schobbes tests separately from the Hobbes tests since they'll use a different driver. But you can run them from the same Rakefile.

Add a second CIAT::Suite().run to the ciat task in acceptance/Rakefile.

Finally, for debugging purposes, you may want to be able to run the Schobbes interpreter on the command line (like ./script/hobbes).

Make a copy of script/hobbes as script/schobbes. In script/schobbes, use SchobbesCLI instead of HobbesCLI.

Run the acceptance test, and the Schobbes test should fail. Run the Schobbes driver, and it should fail.

The failure should be that the class cannot be found (because it doesn't exist).

Let's do a major cheat.

Eclipse tip: try a copy-and-paste in the Package Explorer. It takes care of renaming the class in the copy!

Copy HobbesCLI as SchobbesCLI.

For a false sense of accomplishment:

Run all of the CIAT tests (old and new) to see them pass!

The new driver itself has a new name, but it's using all of the old classes, so it should work.

A New Front End

The SchobbesCLI is using a HobbesFrontEnd and a IHobbesFrontEnd. As you should imagine, the latter is an interface for the former.

Use SchobbesFrontEnd and ISchobbesFrontEnd in SchobbesCLI. (Don't rename the classes, just use these names instead.)

Of course you get compilation errors since they don't exist.

Eclipse's quick-fix can greatly help you fix these problems. Create them in this order so that quick-fix will automatically insert inheritance for you.

  1. Create a new interface ...hobbes.frontend.ISchobbesFrontEnd.
  2. Create a new class ...hobbes.frontend.SchobbesFrontEnd which implements ISchobbesFrontEnd.

You'll then need to take care of defining a constructor and a method (as indicated by compiler errors).

I usually leave stubbed constructors empty. Stubbed methods throw an "unimplemented" exception.

Write a stub for the required constructor. And add buildTIR() to the ISchobbesFrontEnd hierarchy.

This brings us right up the real front end. A Tree Intermediate Representation (TIR) is the final product from the front end. Before you fill in the code for SchobbesFrontEnd#buildTIR() (which would sit for quite a while with compilation errors), you have to get your ANTLR grammars up and running.

Quick-and-Dirty ANTLR

The Lexer and Parser

Traditionally, a front end is built with a lexer and a parser. To make your interpretation easier, you'll have use a "tree parser", but you have to make a basic parser with a lexer first.

Let's first set up some files.

  1. In ANTLRWorks, create a file Schobbes.g in ...interpreter/hobbes/frontend (in the src folder).
  2. Create test classes SchobbesLexerTest and SchobbesParserTest in the same package but in the test source folder.

Here's how to get Schobbes.g started:

grammar Schobbes;  // the name of the grammar

options {
  language=Java;   // generate Java classes
  output=AST;      // parser should generate ANTLR ASTs
  ASTLabelType=CommonTree;  // the class to use for ASTs
}

@header {
  // the Java package to put the classes in
  package edu.calvin.jdfrens.cs214.interpreters.hobbes.frontend;
}
@lexer::header {
  // the Java package to put the classes in
  package edu.calvin.jdfrens.cs214.interpreters.hobbes.frontend;
}

Add this code to Schobbes.g, and delete the comments.

You'll get an ANTLR error because you don't actually have any grammar definitions. Everything you have in the file right now is meta-information: what the grammar is called, how the grammar should be generated, where to generated it, etc. Here's a parser rule that'll make ANTLR happy:

program
  : 'hey' EOF
  ;

The spacing isn't too terribly critical, but you should be consistent in your formatting. For right now, we aren't going to worry about what this does. ANTLR is sufficiently happy, so we're sufficiently happy.

Enter this definition into the grammar file (in ANTLRWorks). Generate Java classes with Generate -> Generate Code. In Eclipse, refresh your project, and look at the new Java classes that have been generated for you.

However, you should notice that ANTLR has generated three files for you: SchobbesLexer, SchobbesParser, and Schobbes.tokens. Even a grammar as small as this generates a fair amount of code!

Tree Builder with a Tree Parser

Once your parser produces an AST, you can traverse the AST itself in your Java code, or use an ANTLR tree parser which will parse and traverse the AST for you. You'll use a tree parser to turn ANTLR's primitive (and procedural) AST into a TIR.

Create a new file SchobbesTIRBuilder.g in ...hobbes/frontend. Create a new test class SchobbesTIRBuilderTest.

A tree parser looks just a bit different:

tree grammar SchobbesTIRBuilder;

options {
  language=Java;
  tokenVocab=Schobbes;    // based on the name of the lexer-parser grammar
  ASTLabelType=CommonTree;
}

@header {
  package edu.calvin.jdfrens.cs214.interpreters.hobbes.frontend;
  
  // to build TIRs, you need to import the TIR classes!
  import org.norecess.citkit.tir.*;
  import org.norecess.citkit.tir.expressions.*;
  import org.norecess.citkit.tir.lvalues.*;
}

Copy this code into SchobbesTIRBuilder.g without the comments.

Again, ANTLR will complain because you haven't given it any tree-parser rules. Here's a very minimal rule:

program returns [ExpressionTIR tir]
  :
  ;

This will probably be the last time I remind you to generate the code from your grammar files and refresh your Eclipse project. You have to do this every time you make a change to a grammar file.

Add this to the end of SchobbesTIRBuilder.g. Generate code from this grammar in ANTLRWorks, refresh your Eclipse project, and look at the code.

This is a good point to stop and commit your files to a version control system. The grammars generate four files: Schobbes.tokens, SchobbesLexer.java, SchobbesParser.java, and SchobbesTIRBuilder.java. Since these files are generated, there's little reason to store them in your repo (and good reasons not to). Ignore those four files in the ...hobbes.frontend package, and commit all of your other changes including the .g files.

Back to the Top Level

Run your CIAT tests.

The Schobbes tests should fail because of the SchobbesFrontEnd#buildTIR() method. Now that you have classes for the lexer, parser, and tree builder, you can write this method. Or just copy it:

public ExpressionTIR buildTIR() throws RecognitionException, IOException {
        String name = myFile.getAbsolutePath();
        // create the lexer to read characters from a file
        SchobbesLexer lexer = new SchobbesLexer(new ANTLRFileStream(name));
        // create a parser from the lexer
        SchobbesParser parser = new SchobbesParser(new CommonTokenStream(lexer));
        // use the parser's "program" rule to parse the input
        Tree ast = parser.program().tree;
        // create a TIR builder from the parser's AST
        SchobbesTIRBuilder builder = new SchobbesTIRBuilder(
                        new CommonTreeNodeStream(ast));
        // build the TIR
        ExpressionTIR tir = builder.program();
        // throw an exception if there are any errors
        if (anyErrors(lexer, parser, builder)) {
                throw new RuntimeException("There were syntax errors!");
        }
        return tir;
}

// each stage of the front end is a "base recognizer",
// and each recognizer keeps track of the syntax errors it encounters
private boolean anyErrors(BaseRecognizer... stages) {
        for (BaseRecognizer stage : stages) {
                if (stage.getNumberOfSyntaxErrors() > 0) {
                        return true;
                }
        }
        return false;
}

The constructors and interface to the lexer, parser, and tree parser are a bit awkward with ANTLR.

Why delete the comments? Because they're here for pedagogical reasons. As soon as they get into your code base, they're in production. If you even need to try to understand this code again, you can come back here again.

Define an instance variable (myFile), and initialize it in the constructor. Replace your buildTIR() with this version, and add the anyErrors() method. Delete the comments, and add the same throws declaration in the interface.

Run the CIAT tests.

The complaint now will be that for any input you try to give the Schobbes front end, it wants to find an h.

Back to the Lexer

Why an h? Because that's what you have the parser looking for!

program
  : 'hey' EOF
  ;

This says that when the program rule (or production) is called, the lexer needs to return the text hey followed by the end-of-file.

You'll fix this production in just a second, but let's start with the lexer. It's the lexer that's supposed to recognize and identify an integer. Let's start with a really simple test and lexer.

Create a test method shouldRecognizeIntegers() in SchobbesLexerTest. Also create a set up method that initializes an instance variable (myTester) to be a new ANTLRTester(new SchobbesFrontEndFactory()).

The ANTLRTester class comes from the ANTLR Testing library. SchobbesFrontEndFactory() is something you'll have to define.

The IANTLRFrontEnd interface is misnamed. It should be called IANTLRFrontEndFactory since that more closely reflects its purpose.

Create a class SchobbesFrontEndFactory in the test folder that implements the IANTLRFrontEnd interface. Implement the three methods required by the interface with "unimplemented" exceptions.

Now you can finally write an assertion:

assertToken(SchobbesParser.INTEGER, "5", myTester.scanInput("5"));

assertToken() and assertTree() (later in this iteration) come from the ANTLR Testing library. A static import of org.norecess.antlr.Assert.* is really helpful.

This assertion says that when the input 5 is scanned by the lexer, it should return a token whose type is "integer" and its text is 5. Tokens are the objects that a lexer returns. A token is a collection of characters that form a "word" in the source language.

Before you add any more assertions, let's get this one working. The compiler is complaining because INTEGER isn't a valid token type. ANTLR implements these token types as int constants in the lexer and parser classes. In order to get a token type created, you need to declare it as a lexer rule.

To get this one assertion to pass, you need a lexer rule that recognizes 5 as an integer. Or, to state it more declaratively, an integer is the text 5.

INTEGER
  : 'a'
  ;

Of course, you can't just jump directly to a solution without checking to make sure that the assertion fails. So this lexer rule says that an INTEGER is the text a.

Schobbes.g has your "grammar" in it. SchobbesTIRBuilder.g has your "tree grammar" or "tree-parser grammar" or "tree parser" in it.

Add this rule to your grammar, and red bar.

The error will be that you aren't creating a lexer in SchobbesFrontEndFactory:

public Lexer createLexer(String input) {
        return new SchobbesLexer(new ANTLRStringStream(input));
}

Replace this method in SchobbesFrontEndFactory. Still a red bar.

Read the error message carefully. You should see indicates that the lexer didn't like the text 5.

Fix the INTEGER production. Green bar.

Unfortunately...

Run the CIAT tests. Red bar.

You should have two problems. First, your Schobbes front end won't like the newline character at the end of the source file. Second, the parser still wants to see the text hey!

Ignoring Whitespace

Create a test method shouldIgnoreWhitespace() in SchobbesLexerTest. Duplicate the assertion from shouldRecognizeIntegers(), but change the scanned input to "5\n". Red bar.

Do not change the first "5" in the assertion, only change the scanned input. The whole idea here is that the "5\n" is scanned and the newline thrown away.

The solution is to recognize the newline character in the lexer, and then skip it.

WHITESPACE
  : '\n'
    { skip(); }
  ;

The curly braces here enclose Java code that says the token just recognized should be ignored. So if you go scanning "5\n", you'll only get the INTEGER token back.

Add this to the Schobbes grammar. Green bar.

CIAT tests still red bar.

The CIAT test should just complain about 5 not being hey.

The Parser

Open SchobbesParserTest. Create myTester and a set up method as you did for SchobbesLexerTest. Create a test method shouldParseInteger().

Here's an assertion for parsing:

assertTree(
  SchobbesParser.INTEGER,
  "(5)",
  myTester.scanInput("5").parseAs("program")
  );

This asks the ANTLR tester to scan the input 5 and give the token to the parser, and the parser should parse the tokens as a program (because that's the name of the one parser rule you currently have). The parser returns an AST. The root of this tree should be an INTEGER, and as a string it will look like (5). (Why there are parentheses around the 5 will become clear in a later iteration.)

Add this assertion. Red bar.

The red bar is because this method is poorly defined in SchobbesFrontEndFactory:

public Parser createParser(TokenStream tokens) {
        return new SchobbesParser(tokens);
}

Replace the method, and still red bar.

That's because the production still says this:

program
  : 'hey' EOF
  ;

This rule says that a program consists of the text hey followed by the end-of-file. You could replace 'hey' with '5', but then it won't be tagged as an INTEGER. Instead, use INTEGER itself, replacing 'hey'.

Replace 'hey' with INTEGER. Red bar.

The error message you'll get won't be of much help. The problem is that as written, you're actually returning two ASTs: the INTEGER and the EOF. You don't need that end-of-file indicator any more, so tell ANTLR to ignore it with the ! operator.

Replace EOF with EOF! in the program rule. Green bar.

Run the CIAT tests. Red bar.

The problem is now the TIR builder.

The TIR Builder

The CIAT test should give you a null-pointer exception. This is because the SchobbesTIRBuilder returns null by default. The first step (as always) is to get a failing unit test.

Create an ANTLRTester instance variable and setup method in SchobbesTIRBuilderTest as you did for the two other test classes. Create a test method shouldBuildIntegerTIR().

Here's an assertion:

assertEquals(
  new IntegerETIR(5),
  myTester.scanInput("5").parseAs("program").treeParseAs("program")
  );

The use of myTester is the same as before. ANTLRTester objects use a fluent interface to do the heavy lifting and stay readable. This time you can use the standard assertEquals() because the tree builder returns real objects that you want to manipulate in Java. In this case, you're getting back an IntegerETIR.

Add this assertion. Red bar.

The failure is this missing implementation:

public TreeParser createTreeParser(Tree ast) {
        return new SchobbesTIRBuilder(new CommonTreeNodeStream(ast));
}

Replace this method in SchobbesFrontEndFactory. Red bar.

Now the failure is due to the TIR builder itself. You have this one rule:

program returns [ExpressionTIR tir]
  :
  ;

The [ExpressionTIR tir] specifies what data you're going to return from this rule. (You can, in fact, return more than one thing.) You need to make Java declarations in the square brackets, so this is saying you have a Java variable tir to assign a value to.

But first, you have to recognize an integer. Here in this tree parser, an integer still goes by the same name!

program returns [ExpressionTIR tir]
  : INTEGER
  ;

Modify the rule in the TIR builder. Red bar, same reason.

Keep in mind that program rule returns a TIR only if you put one in tir. As you saw in the rule for WHITESPACE, you can add Java code in curly braces at the end of a rule. For WHITESPACE, you had to know that skip() was a method you could call freely. Here, you just need to know that you should assign a value to tir. What you assign to it can be any valid Java expression.

program returns [ExpressionTIR tir]
  : INTEGER
    { tir = new IntegerETIR(5); }
  ;

Seems a bit like cheating... But the tests should be fine!

Add this Java code to the program rule. Green bar.

And the kicker:

Run the CIAT tests. They all pass!

Deep Breath

There's a lot going on in this iteration, and you don't have much you can show off for it. You have a lot of new code, but it has a really limited audience. You have, however, set up quite a few things for future iterations that will make them a bit easier.

Whatever shortcuts you've taken so far (like hardcoding a 5 in the TIR builder) will be fixed as you push through the story that motivated this work.