Anglr File Structure
Introduction
Anglr file is composed of different parts. Such a division is important in order to distinguish between functionally different parts. These parts may later be referenced where it makes sense. References are created using attributes. More on how we reference certain parts will be known after we get to know these parts.
Syntax
Following are the syntax rules used to define anglr file structure:
RULE F-1
<anglr file> : <anglr file part list> ? ;
RULE F-2
<anglr file part list> : <anglr file part> + ;
RULE F-3
<anglr file part> : <general part> | <declaration part> | <scanner part> | <lexer part> | <parser part> ;
Discussion
RULE F-1 - Structure of Anglr File
Rule RULE F-1 defines the syntax of anglr file. Anglr file is composed by possibly empty list of anglr file parts.
RULE F-2 - Anglr File Part List
Rule RULE F-2 defines the syntax of anglr file parts list. It is nothing but a nonempty sequence of file parts. There are no special separators between them except space characters.
RULE F-3 - Kind of Anglr File Parts
Rule RULE F-3 defines all possible parts of a file:
- general part: where general things regarding all file are defined
- declaration part: where declarations are specified
- scanner part: where scanners are defined. Scanner part can reference any number of declaration parts
- lexer part: where lexical analyzer is defined. Lexer part can reference many scanner parts
- parser part: where syntax rules for syntax analyzer are defined. Parser part can reference declaration parts, scanner parts and lexer parts
There can be any number of different type of anglr file parts in particular anglr file: how many and which parts will be used in the file depends on the needs of particular project which uses that anglr file.
Example
This example has already been mentioned on the introductory page, but it is once again repeated there:
[ Description Text='definitions of tokens and regular expressions used to define sntax'] [ Description Text='of simple arithmetic expressions'] [ CompilationInfo ClassName='MathDecls' NameSpace='Math.Declarations' Access='public'] %declarations mathDecls %{ %regex { decimal-digit [0-9] number {decimal-digit}+ } %terminal { NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown } %} [ Description Text='definition of scanner, which extracts comments from input string'] [ Declarations Id='mathDecls' ] [ CompilationInfo ClassName='CommentRegex' NameSpace='Math.ScannerLib' Access='public'] %scanner commentScanner %{ [\*]+\/ pop [\n\r] skip [^\*]+ skip [\*]+ skip %} [ Description Text='definition of scanner, which extracts terminal symbols from input string'] [ Declarations Id='mathDecls' ] [ CompilationInfo ClassName='MathRegex' NameSpace='Math.ScannerLib' Access='public'] %scanner mathScanner %{ \/\* push commentScanner {number} terminal NUMBER \+ terminal add \- terminal sub \* terminal mul \/ terminal div \( terminal lb \) terminal rb [ \t]+ skip [\n\r] skip . skip %} [ Description Text='Lexer for anglr file' Hover='true' ] [ UseScanner ScannerId='commentScanner' InitialScanner='mathScanner' Hover='true' ] [ CompilationInfo ClassName='MathLexer' NameSpace='Math.Lexer' Access='public' Hover='true' ] %lexer mathLexer %{ %} [ Description Text='definition of parser for simple arithmetic expressions'] [ Declarations Id='mathDecls' ] [ Lexer Id='mathLexer' ] [ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public'] %parser mathParser1 %{ [ Start ] expression : additive-expression ; additive-expression : multiplicative-expression | additive-expression '+' multiplicative-expression | additive-expression '-' multiplicative-expression ; multiplicative-expression : unary-expression | multiplicative-expression '*' unary-expression | multiplicative-expression '/' unary-expression ; unary-expression : NUMBER | '(' expression ')' ; %}
Above example defines:
- no general part, since all things needed by angler compiler are defined where they should be
- one declaration part named mathDecls.
- two scanner parts named commentSanner and mathScanner
- one lexer part named mathLexer.
- And one parser part named mathParser1.
As we see, certain parts of the anglr file are referenced from other parts of the file using attributes that are listed just before the part that performs referencing. We can observe these referencing relations:
-
scanner parts commentSanner and mathScanner are referencing declaration part mathDecls
through attribute
[ Declarations Id='mathDecls' ].
This referencing relations define the source of the data structures that will be used by scanners. -
lexer part mathLexer references scanner parts commentSanner and mathScanner through attribute
[ UseScanner ScannerId='commentScanner' InitialScanner='mathScanner' Hover='true' ].
This referencing relations define the structure of lexical analyzer. It is built with two scanners. Initial scanner is mathScanner, other is activated when needed. -
parser part mathParser1 references declarations part mathDecls through attribute
[ Declarations Id='mathDecls' ]
and lexer part mathLexer through attribute
[ Lexer Id='mathLexer' ].
These referencing relations define the data structures and lexical analyzer used by syntax analyzer.
From the above example we can also observe, that all parts of anglr file are composed on the similar way:
- part is preceded by attribute list
- then follows special token %part-type (where part-type can be general, declarations, scanner, lexer or parser) and an identifier specifying name of particular part
- between part parentheses %{ an %} is content of that anglr file part. Contents of anglr file parts can be quite different especially contents of different kinds of anglr file parts.