CS 214: Programming Languages
Spring 2009

Home|Syllabus|Schedule
<<|>>|ANTLR API|ANTLR Testing API|CITkit API

Ignoring Whitespace
FrontEnd, Iteration 2

In the previous iteration, the CIAT tests forced our hand in dealing with whitespace. Let's embrace this hand in this iteration and deal with it completely.

5s with Whitespace

Write three CIAT tests for Hobbes that explore the whitespace handling of the Hobbes front end.

Use only the number 5 in these programs so that you can reuse them for Schobbes tests. Use a variety of spaces, tabs, and newlines in the source programs.

The key question, though, is what should the output be? Hint: the Hobbes front end ignores all whitespace.

CIAT tests: green bar.

Copy these tests into the acceptance/ciat/schobbes/ folder. Red bar.

You should get complaints that spaces and tabs aren't recognized. But all newlines are okay.

Fixing the Grammar

First, a unit test with some assertions.

In SchobbesLexerTest, add at least these assertions to shouldIgnoreWhitespace():

  1. the input is a 5 surrounded by spaces
  2. the input is a 5 surrounded by tabs
  3. the input is a 5 followed by various whitespace
  4. the input is a 5 proceeded by various whitespace
  5. the input is a 5 surrounded by whitespace

Red bar.

Hint: use \n for newlines, an actual space character for the space character, and \t for a tab.

You're already ignoring newlines from this definition:

WHITESPACE
  : '\n'
    { skip(); }
  ;

When you ignore input with the skip() method, it skips all of the input that matches. So all newlines in your input is ignored. It seems logical, then, to add the other characters to this lexer rule.

Expressed in English, what you want is this: if the input character is a space or a tab or a newline, skip it. The "or" relationship in an ANTLR lexer is accomplished with the alternation operator, |. If I wanted to ignore the letters a, b, and c, I'd write ('a' | 'b' | 'c').

Ignore the space character. Red bar, but the "surrounded by spaces" assertion should pass.

Make sure you put parentheses around the alternation. This is needed in this definition because you want the Java code to be attached to the whole rule (not just the last option in the alternation).

Ignore the tab character. Green bars all around.