-
Notifications
You must be signed in to change notification settings - Fork 1
4.2 A Simple Parser Example
A parser applies grammatical analysis to the output of an associated lexer. In order to create a parser example I first need a better lexer example since the lexer grammar from the previous page is so simple it doesn't even recognize spaces and newlines (carriage returns).
Here's the updated version of the previous lexer grammar; this is the one which I actually used to generate a lexer.
AND : 'and';
LETTER : [a-z];
SPACE : ' ';
WINNL : [\r\n];
The function of a parser is to group one or more adjacent tokens in the token sequence produced by a lexer and to provide useful labels for different kinds of groups. That's basically all a parser does. ANTLR 4 parsers do not change the text associated with a token and do not rearrange the order of tokens. (Those are best done in the third step.)
What might a parser to do with the output from the simple lexer created from the above lexer grammar? One option is to have it provide different labels for each occurence of "and" as a standalone word; for each isolated letter; and for each occurence of "and" as part of another word.
Here's the simple parser grammar that I actually used to generate a parser that provides the three suggested labels. It also handles input with more than one line, labels each line, and labels the overall document.
document : line+;
line : ( item ) ( SPACE+ item )* SPACE* WINNL
| space* WINNL
;
item : andAsWholeWord
| letter
| andAsPartWord
;
andAsWholeWord : AND;
letter : LETTER;
andAsPartWord : AND LETTER+
| LETTER+ AND
| LETTER+ AND LETTER+
;
[The + sign in the "andAsPartWord" rule means at least one token of that type in a
group of adjacent tokens is required for that group but that the parser is to include as many more
of that type of token as there are
in that portion of the input. The | means a choice. The "andAsPartWord" rule
has three choices but only one is used at a time. The * means optional; items of that type aren't required but as many more as provided by the input
are to be included.]
As I already pointed out, ANTLR 4 parsers don't change
the text that was input to and then output from the lexer. Nonetheless, the parser output is a
very useful data structure,
known as a parse tree, that represents how the parser has grouped and labeled the input.
In the next few pages I'll explain the key third step which processes a parse tree
to transform or translate the starting input into something different! Meanwhile here's a visual display of a parse
tree. (This display was generated from the example lexer and parser
by auxiliary ANTLR 4 tool.) 
This particular parse tree was the parser output from this input to lexer:
dandy and z
andiron stand and