Anglr File Structure

Introduction

Anglr file is composed of different parts. Such a division is important in order to distinguish between functionally different parts. These parts may later be referenced where it makes sense. References are created using attributes. More on how we reference certain parts will be known after we get to know these parts.

Syntax

Following are the syntax rules used to define anglr file structure:

RULE F-1

<anglr file>
            : <anglr file part list> ?
            ;

RULE F-2

<anglr file part list>
            : <anglr file part> +
            ;

RULE F-3

<anglr file part>
            : <general part>
            | <declaration part>
            | <scanner part>
            | <lexer part>
            | <parser part>
            ;

Discussion

RULE F-1 - Structure of Anglr File

Rule RULE F-1 defines the syntax of anglr file. Anglr file is composed by possibly empty list of anglr file parts.

RULE F-2 - Anglr File Part List

Rule RULE F-2 defines the syntax of anglr file parts list. It is nothing but a nonempty sequence of file parts. There are no special separators between them except space characters.

RULE F-3 - Kind of Anglr File Parts

Rule RULE F-3 defines all possible parts of a file:

general part: where general things regarding all file are defined
declaration part: where declarations are specified
scanner part: where scanners are defined. Scanner part can reference any number of declaration parts
lexer part: where lexical analyzer is defined. Lexer part can reference many scanner parts
parser part: where syntax rules for syntax analyzer are defined. Parser part can reference declaration parts, scanner parts and lexer parts

There can be any number of different type of anglr file parts in particular anglr file: how many and which parts will be used in the file depends on the needs of particular project which uses that anglr file.

Example

This example has already been mentioned on the introductory page, but it is once again repeated there:

[ Description Text='definitions of tokens and regular expressions used to define sntax']
[ Description Text='of simple arithmetic expressions']
[ CompilationInfo ClassName='MathDecls' NameSpace='Math.Declarations' Access='public']
%declarations mathDecls
%{
            %regex
            {
            decimal-digit [0-9]
            number {decimal-digit}+
            }

            %terminal
            {
            NUMBER
            add '+'
            sub '-'
            mul '*'
            div '/'
            lb '('
            rb ')'
            unknown
            }
%}

[ Description Text='definition of scanner, which extracts comments from input string']
[ Declarations Id='mathDecls' ]
[ CompilationInfo ClassName='CommentRegex' NameSpace='Math.ScannerLib' Access='public']
%scanner commentScanner
%{
[\*]+\/
            pop
[\n\r]
            skip
[^\*]+
            skip
[\*]+
            skip
%}

[ Description Text='definition of scanner, which extracts terminal symbols from input string']
[ Declarations Id='mathDecls' ]
[ CompilationInfo ClassName='MathRegex' NameSpace='Math.ScannerLib' Access='public']
%scanner mathScanner
%{
\/\*
            push commentScanner
{number}
            terminal NUMBER
\+
            terminal add
\-
            terminal sub
\*
            terminal mul
\/
            terminal div
\(
            terminal lb
\)
            terminal rb
[ \t]+
            skip
[\n\r]
            skip
.
            skip
%}

[ Description Text='Lexer for anglr file' Hover='true' ]
[
            UseScanner
            ScannerId='commentScanner'
            InitialScanner='mathScanner'
            Hover='true'
]
[ CompilationInfo ClassName='MathLexer' NameSpace='Math.Lexer' Access='public' Hover='true' ]
%lexer mathLexer
%{

%}

[ Description Text='definition of parser for simple arithmetic expressions']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser1
%{

[ Start ]
expression
            : additive-expression
            ;

additive-expression
            : multiplicative-expression
            | additive-expression '+' multiplicative-expression
            | additive-expression '-' multiplicative-expression
            ;

multiplicative-expression
            : unary-expression
            | multiplicative-expression '*' unary-expression
            | multiplicative-expression '/' unary-expression
            ;

unary-expression
            : NUMBER
            | '(' expression ')'
            ;

%}

Above example defines:

no general part, since all things needed by angler compiler are defined where they should be
one declaration part named mathDecls.
two scanner parts named commentSanner and mathScanner
one lexer part named mathLexer.
And one parser part named mathParser1.

As we see, certain parts of the anglr file are referenced from other parts of the file using attributes that are listed just before the part that performs referencing. We can observe these referencing relations:

scanner parts commentSanner and mathScanner are referencing declaration part mathDecls through attribute
[ Declarations Id='mathDecls' ].
This referencing relations define the source of the data structures that will be used by scanners.
lexer part mathLexer references scanner parts commentSanner and mathScanner through attribute
[ UseScanner ScannerId='commentScanner' InitialScanner='mathScanner' Hover='true' ].
This referencing relations define the structure of lexical analyzer. It is built with two scanners. Initial scanner is mathScanner, other is activated when needed.
parser part mathParser1 references declarations part mathDecls through attribute
[ Declarations Id='mathDecls' ]
and lexer part mathLexer through attribute
[ Lexer Id='mathLexer' ].
These referencing relations define the data structures and lexical analyzer used by syntax analyzer.

From the above example we can also observe, that all parts of anglr file are composed on the similar way:

part is preceded by attribute list
then follows special token %part-type (where part-type can be general, declarations, scanner, lexer or parser) and an identifier specifying name of particular part
between part parentheses %{ an %} is content of that anglr file part. Contents of anglr file parts can be quite different especially contents of different kinds of anglr file parts.