Skip to content

4.2 A Simple Parser Example

Susan edited this page May 8, 2018 · 11 revisions

The Lexer Grammar Used to Create an Example Lexer

A parser applies grammatical analysis to the output of an associated lexer. In order to create a parser example I first need a better lexer example since the lexer grammar from the previous page is so simple it doesn't even recognize spaces and newlines (carriage returns).

Here's the updated version of the previous lexer grammar; this is the one which I actually used to generate a lexer.

AND     : 'and';
LETTER  : [a-z];
SPACE   : ' ';
WINNL   : [\r\n];

What Does a Parser Do?

The function of a parser is to group one or more adjacent tokens in the token sequence produced by a lexer and to provide useful labels for different kinds of groups. That's basically all a parser does. ANTLR 4 parsers do not change the text associated with a token and do not rearrange the order of tokens. (Those are best done in the third step.)

The Parser Grammar Used to Create a Parser to Process the Output of the Example Lexer

What might a parser to do with the output from the simple lexer created from the above lexer grammar? One option is to have it provide different labels for each occurence of "and" as a standalone word; for each isolated letter; and for each occurence of "and" as part of another word.

Here's the simple parser grammar that I actually used to generate a parser that provides the three suggested labels. It also handles input with more than one line, labels each line, and labels the overall document.

document  : line+;

line      : ( item ) ( SPACE+ item )* SPACE* WINNL
          |  space* WINNL
          ;

item            : andAsWholeWord 
                | letter 
                | andAsPartWord 
                ; 
         
andAsWholeWord  : AND;
letter          : LETTER;
andAsPartWord   : AND LETTER+
                | LETTER+ AND 
                | LETTER+ AND LETTER+
                ;

[The + sign in the "andAsPartWord" rule means at least one token of that type in a group of adjacent tokens is required for that group but that the parser is to include as many more of that type of token as there are in that portion of the input. The | means a choice. The "andAsPartWord" rule has three choices but only one is used at a time. The * means optional; items of that type aren't required but as many more as provided by the input are to be included.]

A Parse Tree Created by the Parser

As I already pointed out, ANTLR 4 parsers don't change the text that was input to and then output from the lexer. Nonetheless, the parser output is a very useful data structure, known as a parse tree, that represents how the parser has grouped and labeled the input. In the next few pages I'll explain the key third step which processes a parse tree to transform or translate the starting input into something different! Meanwhile here's a visual display of a parse tree. (This display was generated from the example lexer and parser by auxiliary ANTLR 4 tool.) Parse Tree

This particular parse tree was the parser output from this input to lexer:

dandy and z
andiron stand and

Clone this wiki locally