![]() |
CS 382: Special Topics - Compilers Fall 2004 |
||||||
Unit Testing & ANTLR: The Parser |
|||||||
|
Truth be told, the testing proposed here for a parser is probably more of a cross between unit and integration testing. We'll be used the scanner to generate tokens for the parser, and strict unit testing would suggest that we hide these details away. So if anyone complains, you'll know why. Parsing the tokens from the scanner results in an abstract syntax tree (AST) which more accurately represents the structure of the program. A scanner picks up the individual words, and parsing organizes those words and relates them together in a structure. Let's start simple with the spreadsheet parser. First, I define
a My
public TestableParser makeParser(TokenStream lexer) {
return new SpreadsheetParser(lexer);
}
The Now, in the grammar file, I can start writing my grammar. A
parser must be declared after the
class SpreadsheetParser extends Parser;
options {
buildAST=true;
classHeaderSuffix=org.norecess.antlr.TestableParser;
}
The first line looks like the first line of the lexer, just with
the substitutions you'd expect. The first option has your parser
build an AST. (You might not want an AST if the actions in your
grammar did all of the work you needed.) The second option here
adds an Now my compiler complains about some methods that
{
private boolean myFailure = false;
private boolean myQuietErrors = false;
public void reportError(RecognitionException ex) {
if (shouldPrintErrors())
super.reportError(ex);
myFailure = true;
}
public boolean successfulParse() { return !myFailure; }
public void atEOF() {
try {
match(EOF);
} catch (Exception e) {
myFailure = true;
}
}
public void setQuietErrors(boolean quietErrors) {
myQuietErrors = quietErrors;
}
private boolean shouldPrintErrors() {
return !myQuietErrors;
}
}
This code is all added to The first public method here is overriding a method from ANTLR's
The other public methods are all required by the
Now the code compiles. Finally I can start testing. I'm going to start simple: recognize an integer literal as an equation. First my test:
public void testIntegerEquation() {
assertNode(INTEGER, "123", produce("equation", "123"));
}
This doesn't compile because of the Note the call to
I also have to add at least one rule in my grammar, so I add this:
equation
: EOF
;
When I run my tests (since the code now compiles), my new test
fails as I expect. To fix it, I add the Strangely, I've fixed the one failure---my integer parses just
fine; but two old tests fail in the lexer! By using the
I add another assertion to test other integer expressions. The assertions so far only test the root of the AST that's
returned. If the input is an integer, the only node in the
AST should be the integer token. In turns out, that's not the case
yet.
public void testIntegerSingleton() {
assertNoNextSibling(produce("equation", "123"));
assertNoFirstChild(produce("equation", "123"));
}
This tests to make sure that the node returned by the
equation
: INTEGER EOF!
;
Now the This leads to another observation: So, next, I want to write some addition expressions. I start with my tests:
public void testAdditionEquation() {
assertNode(OPERATOR, "+", produce("equation", "1+2"));
}
To define this new node type, I add a
tokens {
OPERATOR;
}
Now my code is happy, and I get the expected red bar from my tests. The fix for the red bar is to add a production to the
equation
: INTEGER EOF!
| INTEGER PLUS INTEGER EOF!
;
If I run my code with this declaration, I still get a red bar. I've ignored a warning from ANTLR: these two productions are ambiguous. Currently ANTLR is using just a one-token lookahead. But that's not enough to figure out which production to follow when the next token in an integer. It could be a single integer or an integer followed by a plus sign. One solution would be to left factor the language. I loose some
structure in the grammar, so I prefer to increase my lookahead. I
do this in the The problem is that I'm not yet returning an
: INTEGER PLUS^ INTEGER EOF! My tests still red bar, but at least it's complaining
about the
| INTEGER p:PLUS^ INTEGER EOF!
{ #p.setType(OPERATOR); }
The My unit tests pass! One thing bothers me now: I've had to deal with that
equation
: e EOF!
;
e : INTEGER
| INTEGER p:PLUS^ INTEGER
{ #p.setType(OPERATOR); }
;
My unit tests still pass: the So I add yet another assertion for addition equations:
assertNode(OPERATOR, "+", produce("equation", "1+2+3"));
It fails. There are many ways to solve this. One option
would be to turn one or the other ANTLR allows regular expressions in these productions. To handle
e : INTEGER
| INTEGER ( p:PLUS^ INTEGER { #p.setType(OPERATOR); } )+
;
The positive closure allows the input to have multiple additions in the same expression. Each plus sign ultimately becomes the root of a new AST. Since the parsing is done left to right, the first plus sign has the first two integers as its children; the second plus sign has the first AST and the third integer as its children; the third plus sign has the second AST and the fourth integer ast its children; and so on... That's exactly what we want since addition is left associative! I'm not sure I believe this, so I write tests:
public void testAdditionEquation() {
assertNode(OPERATOR, "+", produce("equation", "1+2"));
assertNode(OPERATOR, "+", produce("equation", "1+2+3"));
assertNode(OPERATOR, "+", produce("equation", "1+2+3").getFirstChild());
assertNode(INTEGER, "1", produce("equation", "1+2+3").getFirstChild()
.getFirstChild());
assertNode(INTEGER, "2", produce("equation", "1+2+3").getFirstChild()
.getFirstChild().getNextSibling());
assertNode(INTEGER, "3", produce("equation", "1+2+3").getFirstChild()
.getNextSibling());
}
[[As you might be able to tell, this is where my supporting code lacks the most: deep comparisons of ASTs!]] I'm golden! What about substraction?
public void testSubtractionEquation() {
assertNode(OPERATOR, "-", produce("equation", "1-2"));
assertNode(OPERATOR, "-", produce("equation", "1+2-3"));
assertNode(OPERATOR, "+", produce("equation", "1+2-3").getFirstChild());
assertNode(INTEGER, "1", produce("equation", "1+2-3").getFirstChild()
.getFirstChild());
assertNode(INTEGER, "2", produce("equation", "1+2-3").getFirstChild()
.getFirstChild().getNextSibling());
assertNode(INTEGER, "3", produce("equation", "1+2-3").getFirstChild()
.getNextSibling());
}
Actually, this also tests addition. Regardless, I get a red bar. The solution is bulky:
e : INTEGER
| INTEGER
(
( p:PLUS^ INTEGER { #p.setType(OPERATOR); }
| m:MINUS^ INTEGER { #m.setType(OPERATOR); }
)
)+
;
Alternation over the two operators doesn't work because you can't name them properly; making them roots must be done immediately. But this does green bar. |
|||||||
|