Getting Started
FrontEnd, Iteration 1
User Story
Changes to your interpreter should take place based on user stories from your client. Your client wants an alternative interpreter: the look of Scheme with the Hobbes interpreter. Let's call this new language "Schobbes".
User Story #2: Arithmetic expressions in Schobbes
Arithmetic expressions (binary arithmetic only) are written prefix with parentheses.
Example: (+ 2 3)
and (+ 5 (* 9
18))
.
You're not going to get all of this done in this iteration. In fact, all you're going to worry about it recognizing the number 5 as a valid Schobbes program!
Acceptance Test and a Driver
Create a new folder acceptance/ciat/schobbes
.
Create a CIAT test in this folder for a program consisting only of
the integer 5
.
For simple data like integers and strings, these acceptance tests will be the same as the ones for Hobbes. So take an acceptance test that you wrote for Hobbes and tweak it for Schobbes.
You need to run the Schobbes tests separately from the Hobbes
tests since they'll use a different driver. But you can
run them from the same Rakefile
.
Add a second CIAT::Suite().run
to the
ciat
task in acceptance/Rakefile
.
- Use a different processor, a
SchobbesCLI
class. - The input folder should be
ciat/schobbes
. - The report should be called
schobbes.html
.
Finally, for debugging purposes, you may want to be able to run
the Schobbes interpreter on the command line (like
./script/hobbes
).
Make a copy of script/hobbes
as
script/schobbes
. In script/schobbes
, use
SchobbesCLI
instead of HobbesCLI
.
Run the acceptance test, and the Schobbes test should fail. Run the Schobbes driver, and it should fail.
The failure should be that the class cannot be found (because it doesn't exist).
Let's do a major cheat.
Eclipse tip: try a copy-and-paste in the Package Explorer. It takes care of renaming the class in the copy!
Copy HobbesCLI
as SchobbesCLI
.
For a false sense of accomplishment:
Run all of the CIAT tests (old and new) to see them pass!
The new driver itself has a new name, but it's using all of the old classes, so it should work.
A New Front End
The SchobbesCLI
is using a
HobbesFrontEnd
and a IHobbesFrontEnd
. As
you should imagine, the latter is an interface for the former.
Use SchobbesFrontEnd
and
ISchobbesFrontEnd
in SchobbesCLI
.
(Don't rename the classes, just use these names
instead.)
Of course you get compilation errors since they don't exist.
Eclipse's quick-fix can greatly help you fix these problems. Create them in this order so that quick-fix will automatically insert inheritance for you.
- Create a new interface
...hobbes.frontend.ISchobbesFrontEnd
. - Create a new class
...hobbes.frontend.SchobbesFrontEnd
which implementsISchobbesFrontEnd
.
You'll then need to take care of defining a constructor and a method (as indicated by compiler errors).
I usually leave stubbed constructors empty. Stubbed methods throw an "unimplemented" exception.
Write a stub for the required constructor. And add
buildTIR()
to the ISchobbesFrontEnd
hierarchy.
This brings us right up the real front end. A Tree Intermediate
Representation (TIR) is the final product from the front end.
Before you fill in the code for
SchobbesFrontEnd#buildTIR()
(which would sit for
quite a while with compilation errors), you have to get
your ANTLR grammars up and running.
Quick-and-Dirty ANTLR
The Lexer and Parser
Traditionally, a front end is built with a lexer and a parser. To make your interpretation easier, you'll have use a "tree parser", but you have to make a basic parser with a lexer first.
Let's first set up some files.
- In ANTLRWorks, create a file
Schobbes.g
in...interpreter/hobbes/frontend
(in thesrc
folder). - Create test classes
SchobbesLexerTest
andSchobbesParserTest
in the same package but in thetest
source folder.
Here's how to get Schobbes.g
started:
grammar Schobbes; // the name of the grammar options { language=Java; // generate Java classes output=AST; // parser should generate ANTLR ASTs ASTLabelType=CommonTree; // the class to use for ASTs } @header { // the Java package to put the classes in package edu.calvin.jdfrens.cs214.interpreters.hobbes.frontend; } @lexer::header { // the Java package to put the classes in package edu.calvin.jdfrens.cs214.interpreters.hobbes.frontend; }
Add this code to Schobbes.g
, and delete the
comments.
You'll get an ANTLR error because you don't actually have any grammar definitions. Everything you have in the file right now is meta-information: what the grammar is called, how the grammar should be generated, where to generated it, etc. Here's a parser rule that'll make ANTLR happy:
program : 'hey' EOF ;
The spacing isn't too terribly critical, but you should be consistent in your formatting. For right now, we aren't going to worry about what this does. ANTLR is sufficiently happy, so we're sufficiently happy.
Enter this definition into the grammar file (in ANTLRWorks). Generate Java classes with Generate -> Generate Code. In Eclipse, refresh your project, and look at the new Java classes that have been generated for you.
However, you should notice that ANTLR has generated three files
for you: SchobbesLexer
, SchobbesParser
,
and Schobbes.tokens
. Even a grammar as small as this
generates a fair amount of code!
Tree Builder with a Tree Parser
Once your parser produces an AST, you can traverse the AST itself in your Java code, or use an ANTLR tree parser which will parse and traverse the AST for you. You'll use a tree parser to turn ANTLR's primitive (and procedural) AST into a TIR.
Create a new file SchobbesTIRBuilder.g
in
...hobbes/frontend
. Create a new test class
SchobbesTIRBuilderTest
.
A tree parser looks just a bit different:
tree grammar SchobbesTIRBuilder; options { language=Java; tokenVocab=Schobbes; // based on the name of the lexer-parser grammar ASTLabelType=CommonTree; } @header { package edu.calvin.jdfrens.cs214.interpreters.hobbes.frontend; // to build TIRs, you need to import the TIR classes! import org.norecess.citkit.tir.*; import org.norecess.citkit.tir.expressions.*; import org.norecess.citkit.tir.lvalues.*; }
Copy this code into SchobbesTIRBuilder.g
without
the comments.
Again, ANTLR will complain because you haven't given it any tree-parser rules. Here's a very minimal rule:
program returns [ExpressionTIR tir] : ;
This will probably be the last time I remind you to generate the code from your grammar files and refresh your Eclipse project. You have to do this every time you make a change to a grammar file.
Add this to the end of SchobbesTIRBuilder.g
.
Generate code from this grammar in ANTLRWorks, refresh your Eclipse
project, and look at the code.
This is a good point to stop and commit your files to a version
control system. The grammars generate four files:
Schobbes.tokens
, SchobbesLexer.java
,
SchobbesParser.java
, and
SchobbesTIRBuilder.java
. Since these files are
generated, there's little reason to store them in your repo (and
good reasons not to). Ignore those four files in the
...hobbes.frontend
package, and commit all of your
other changes including the .g
files.
Back to the Top Level
Run your CIAT tests.
The Schobbes tests should fail because of the
SchobbesFrontEnd#buildTIR()
method. Now that you have
classes for the lexer, parser, and tree builder, you can write this
method. Or just copy it:
public ExpressionTIR buildTIR() throws RecognitionException, IOException { String name = myFile.getAbsolutePath(); // create the lexer to read characters from a file SchobbesLexer lexer = new SchobbesLexer(new ANTLRFileStream(name)); // create a parser from the lexer SchobbesParser parser = new SchobbesParser(new CommonTokenStream(lexer)); // use the parser's "program" rule to parse the input Tree ast = parser.program().tree; // create a TIR builder from the parser's AST SchobbesTIRBuilder builder = new SchobbesTIRBuilder( new CommonTreeNodeStream(ast)); // build the TIR ExpressionTIR tir = builder.program(); // throw an exception if there are any errors if (anyErrors(lexer, parser, builder)) { throw new RuntimeException("There were syntax errors!"); } return tir; } // each stage of the front end is a "base recognizer", // and each recognizer keeps track of the syntax errors it encounters private boolean anyErrors(BaseRecognizer... stages) { for (BaseRecognizer stage : stages) { if (stage.getNumberOfSyntaxErrors() > 0) { return true; } } return false; }
The constructors and interface to the lexer, parser, and tree parser are a bit awkward with ANTLR.
Why delete the comments? Because they're here for pedagogical reasons. As soon as they get into your code base, they're in production. If you even need to try to understand this code again, you can come back here again.
Define an instance variable (myFile
), and
initialize it in the constructor. Replace your
buildTIR()
with this version, and add the
anyErrors()
method. Delete the comments, and add the
same throws
declaration in the interface.
Run the CIAT tests.
The complaint now will be that for any input you try to give the
Schobbes front end, it wants to find an h
.
Back to the Lexer
Why an h
? Because that's what you have the parser
looking for!
program : 'hey' EOF ;
This says that when the program
rule (or production) is called,
the lexer needs to return the text hey
followed by the
end-of-file.
You'll fix this production in just a second, but let's start with the lexer. It's the lexer that's supposed to recognize and identify an integer. Let's start with a really simple test and lexer.
Create a test method shouldRecognizeIntegers()
in
SchobbesLexerTest
. Also create a set up method that
initializes an instance variable (myTester
) to be a
new ANTLRTester(new SchobbesFrontEndFactory())
.
The ANTLRTester
class comes from the ANTLR Testing
library. SchobbesFrontEndFactory()
is something you'll
have to define.
The IANTLRFrontEnd
interface is misnamed. It should
be called IANTLRFrontEndFactory
since that more
closely reflects its purpose.
Create a class SchobbesFrontEndFactory
in the
test
folder that implements the
IANTLRFrontEnd
interface. Implement the three methods
required by the interface with "unimplemented" exceptions.
Now you can finally write an assertion:
assertToken(SchobbesParser.INTEGER, "5", myTester.scanInput("5"));
assertToken()
and assertTree()
(later
in this iteration) come from the ANTLR Testing library. A static
import of org.norecess.antlr.Assert.*
is really
helpful.
This assertion says that when the input 5
is
scanned by the lexer, it should return a token
whose type is "integer" and its text is 5
. Tokens are
the objects that a lexer returns. A token is a collection of
characters that form a "word" in the source language.
Before you add any more assertions, let's get this one working.
The compiler is complaining because INTEGER
isn't a
valid token type. ANTLR implements these token types as
int
constants in the lexer and parser classes. In
order to get a token type created, you need to declare it as a
lexer rule.
To get this one assertion to pass, you need a lexer rule that
recognizes 5
as an integer. Or, to state it more
declaratively, an integer is the text 5
.
INTEGER : 'a' ;
Of course, you can't just jump directly to a solution without
checking to make sure that the assertion fails. So this lexer rule
says that an INTEGER
is the text a
.
Schobbes.g
has your "grammar" in it.
SchobbesTIRBuilder.g
has your "tree grammar" or
"tree-parser grammar" or "tree parser" in it.
Add this rule to your grammar, and red bar.
The error will be that you aren't creating a lexer in
SchobbesFrontEndFactory
:
public Lexer createLexer(String input) { return new SchobbesLexer(new ANTLRStringStream(input)); }
Replace this method in SchobbesFrontEndFactory
.
Still a red bar.
Read the error message carefully. You should see indicates that
the lexer didn't like the text 5
.
Fix the INTEGER
production. Green bar.
Unfortunately...
Run the CIAT tests. Red bar.
You should have two problems. First, your Schobbes front end
won't like the newline character at the end of the source file.
Second, the parser still wants to see the text
hey
!
Ignoring Whitespace
Create a test method shouldIgnoreWhitespace()
in
SchobbesLexerTest
. Duplicate the assertion from
shouldRecognizeIntegers()
, but change the scanned
input to "5\n"
. Red bar.
Do not change the first "5"
in the assertion,
only change the scanned input. The whole idea here is that
the "5\n"
is scanned and the newline thrown away.
The solution is to recognize the newline character in the lexer, and then skip it.
WHITESPACE : '\n' { skip(); } ;
The curly braces here enclose Java code that says the token just
recognized should be ignored. So if you go scanning
"5\n"
, you'll only get the INTEGER
token
back.
Add this to the Schobbes grammar. Green bar.
CIAT tests still red bar.
The CIAT test should just complain about 5
not
being hey
.
The Parser
Open SchobbesParserTest
. Create
myTester
and a set up method as you did for
SchobbesLexerTest
. Create a test method
shouldParseInteger()
.
Here's an assertion for parsing:
assertTree( SchobbesParser.INTEGER, "(5)", myTester.scanInput("5").parseAs("program") );
This asks the ANTLR tester to scan the input 5
and
give the token to the parser, and the parser should parse the
tokens as a program (because that's the name of the one parser rule
you currently have). The parser returns an AST. The root of this
tree should be an INTEGER
, and as a string it will
look like (5)
. (Why there are parentheses around the 5
will become clear in a later iteration.)
Add this assertion. Red bar.
The red bar is because this method is poorly defined in
SchobbesFrontEndFactory
:
public Parser createParser(TokenStream tokens) { return new SchobbesParser(tokens); }
Replace the method, and still red bar.
That's because the production still says this:
program : 'hey' EOF ;
This rule says that a program
consists of the text
hey
followed by the end-of-file. You could
replace 'hey'
with '5'
, but then it won't
be tagged as an INTEGER
. Instead, use
INTEGER
itself, replacing 'hey'
.
Replace 'hey'
with INTEGER
. Red
bar.
The error message you'll get won't be of much help. The problem
is that as written, you're actually returning two ASTs:
the INTEGER
and the EOF
. You
don't need that end-of-file indicator any more, so tell ANTLR to
ignore it with the !
operator.
Replace EOF
with EOF!
in the
program
rule. Green bar.
Run the CIAT tests. Red bar.
The problem is now the TIR builder.
The TIR Builder
The CIAT test should give you a null-pointer exception. This is
because the SchobbesTIRBuilder
returns
null
by default. The first step (as always) is to get
a failing unit test.
Create an ANTLRTester
instance variable and setup
method in SchobbesTIRBuilderTest
as you did for the
two other test classes. Create a test method
shouldBuildIntegerTIR()
.
Here's an assertion:
assertEquals( new IntegerETIR(5), myTester.scanInput("5").parseAs("program").treeParseAs("program") );
The use of myTester
is the same as before.
ANTLRTester
objects use a fluent
interface to do the heavy lifting and stay readable. This
time you can use the standard assertEquals()
because
the tree builder returns real objects that you want to manipulate
in Java. In this case, you're getting back an
IntegerETIR
.
Add this assertion. Red bar.
The failure is this missing implementation:
public TreeParser createTreeParser(Tree ast) { return new SchobbesTIRBuilder(new CommonTreeNodeStream(ast)); }
Replace this method in SchobbesFrontEndFactory
. Red
bar.
Now the failure is due to the TIR builder itself. You have this one rule:
program returns [ExpressionTIR tir] : ;
The [ExpressionTIR tir]
specifies what data you're
going to return from this rule. (You can, in fact, return more than
one thing.) You need to make Java declarations in the square
brackets, so this is saying you have a Java variable
tir
to assign a value to.
But first, you have to recognize an integer. Here in this tree parser, an integer still goes by the same name!
program returns [ExpressionTIR tir] : INTEGER ;
Modify the rule in the TIR builder. Red bar, same reason.
Keep in mind that program
rule returns a TIR only
if you put one in tir
. As you saw in the rule for
WHITESPACE
, you can add Java code in curly braces at
the end of a rule. For WHITESPACE
, you had to know
that skip()
was a method you could call freely. Here,
you just need to know that you should assign a value to
tir
. What you assign to it can be any valid Java
expression.
program returns [ExpressionTIR tir] : INTEGER { tir = new IntegerETIR(5); } ;
Seems a bit like cheating... But the tests should be fine!
Add this Java code to the program
rule. Green
bar.
And the kicker:
Run the CIAT tests. They all pass!
Deep Breath
There's a lot going on in this iteration, and you don't have much you can show off for it. You have a lot of new code, but it has a really limited audience. You have, however, set up quite a few things for future iterations that will make them a bit easier.
Whatever shortcuts you've taken so far (like hardcoding a 5 in the TIR builder) will be fixed as you push through the story that motivated this work.