Declaration Part of Anglr File
Introduction
The contents of anglr files consist of sequences of characters that are separated by spaces. These character sequences are words in Anglr. Lexical analyzer of anglr language needs to know how these words are composed. On the other hand, an Anglr language syntax analyzer needs to know what these words mean. These things, the structure of words and their meaning, are defined in the declaration part of anglr file. The structure of words is defined by regular expressions and their meaning is defined by terminal symbols. The declaration part therefore contains definitions of regular expressions and terminal symbols.
Syntax
The following is a list of syntax rules that define the contents of the declaration anglr file part.
RULE D-1
<declaration part> : <attribute list> ? '%declarations' <identifier> '%{' <anglr definition list> ? '%}' ;
RULE D-2
<anglr definition list> : <anglr definition with attribute> + ;
RULE D-3
<anglr definition with attribute> : <attribute list> ? <anglr definition> ;
RULE D-4
<anglr definition> : <single terminal definition> | <single regex definition> | <block of terminal definitions> | <block of regex definitions> ;
RULE D-5
<single terminal definition> : '%terminal' <terminal definition> ;
RULE D-6
<single regex definition> : '%regex' <regex definition> ;
RULE D-7
<block of terminal definitions> : '%terminal' '{' <block terminal definitions> ? '}' ;
RULE D-8
<block of regex definitions> : '%regex' '{' <block regex definitions> ? '}' ;
RULE D-9
<terminal definition> : <identifier> <cstring> ? ;
RULE D-10
<regex definition> : <identifier> <regular expression> ;
RULE D-11
<block terminal definitions> : <block terminal definition> + ;
RULE D-12
<block terminal definition> : <attribute list> ? <terminal definition> ;
RULE D-13
<block regex definitions> : <block regex definition> + ;
RULE D-14
<block regex definition> : <attribute list> ? <regex definition> ;
Here is an example of declaration part:
[ Description Text='definitions of tokens and regular expressions used to define syntax'] [ Description Text='of simple arithmetic expressions'] [ CompilationInfo ClassName='MathDecls' NameSpace='Math.Declarations' Access='public'] %declarations mathDecls %{ %regex { decimal-digit [0-9] number {decimal-digit}+ } %terminal { NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown } %}
Every declaration part is very similar to one defined immediately above. What is interesting is that the declaration part has relatively simple structure, yet we still need a relatively large number of syntax rules to describe its syntax.
Discussion
RULE D-1 - Structure of Declaration Part
Rule RULE D-1 defines the top structure of the declaration part:
- Before declaration part, there is a list of attributes, which may also be empty.
- Attribute list is followed by reserved word %declarations and an identifier which represents the name of declaration part
- between part separators %{ and %} is, possibly empty, list of declarations
In example above, declaration part is preceeded by three attributes:
[ Description Text='definitions of tokens and regular expressions used to define sntax'] [ Description Text='of simple arithmetic expressions'] [ CompilationInfo ClassName='MathDecls' NameSpace='Math.Declarations' Access='public']
Name of declaration part is:
%declarations mathDecls
and contains two declarations:
%regex { decimal-digit [0-9] number {decimal-digit}+ } %terminal { NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown }
First declaration is block declaration of regular expressions and the second one is te block declaration of terminal symbols.
RULE D-2 - List of Anglr Definitions
Rule RULE D-2 defines the list of declarations. It is the sequence of declarations, following one another, delimited by space characters. In example above, there are two declarations. The first one is block declaration of regular expressions and it is followed by block declaration of terminal symbols. Order of declarations is not important. It would be absolutely none of the same if the above declarations were listed in such an order:
%terminal { NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown } %regex { decimal-digit [0-9] number {decimal-digit}+ }
The same declarations can be specified using single form equivalents of block declarations:
%regex decimal-digit [0-9] %regex number {decimal-digit}+ %terminal NUMBER %terminal add '+' %terminal sub '-' %terminal mul '*' %terminal div '/' %terminal lb '(' %terminal rb ')' %terminal unknown
First list (block form) contains two definitions, each defininig multiple terminal symbols and regular expressions. The second list (single form) contains ten definitions, each defining single entity.
RULE D-3 - Anglr Definition
Rule RULE D-3 defines the structure of each declaration and states that before the declaration itself, there can be a list of attributes. In example above, no declaration is preceeded by attribute set, but it should be, like this:
[ Description Text='regular expressions'] %regex { decimal-digit [0-9] number {decimal-digit}+ } [ Description Text='terminal symbols'] %terminal { NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown }
RULE D-4 - Kinds of Anglr Definition
Rule RULE D-4 lists all possible sorts of declarations. They are listed in the same order as productions of that rule:
- single terminal definition, defined by syntax rule <single terminal definition>
- single regular expression definition, defined by syntax rule <single regex definition>
- block of termial definitions, defined by syntax rule <block of terminal definitions>
- block of regular expression definitions, defined by syntax rule <block of regex definitions>
From the discussion of RULE D-2 we can select these examples of different definitions:
-
example of single terminal definition <single terminal definition>:
%terminal NUMBER
-
example of single regular expression definition <single regex definition>:
%regex decimal-digit [0-9]
-
example of block of termial definitions <block of terminal definitions>:
%terminal { NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown }
-
example of block of regular expression definitions <block of regex definitions>:
%regex { decimal-digit [0-9] number {decimal-digit}+ }
RULE D-5 - Single terminal definition
Rule RULE D-5 defines structure of single terminal definition:
- it begins with reserved word %terminal
- reserved word terminal is followe by definition of terminal symbol
Single terminal declarations look like these:
%terminal NUMBER or %terminal add '+'
RULE D-6 - Single Regular Expression Definition
Rule RULE D-6 defines structure of single regular expression definition:
- it begins with reserved word %regex
- followed by definition of regular expression
Examples of single regular expression definitions are following specifications:
%regex decimal-digit [0-9] %regex number {decimal-digit}+
Care should be taken in specifying regular expression definitions: they should be specified in the same line as %regex reserved word. They should not span more than one line. All characters following %regex, except initial spaces, are taken into the account of regular expression definition. Definition of regular expression should not be followed by comment in the same line, since it will be treated as part of definition. In the following example
%regex decimal-digit [0-9] // decimal digits
decimal-digit will not match what one would expect:
- 0
- 1
- 2
- etc.
- 0 // decimal digits
- 1 // decimal digits
- 2 // decimal digits
- etc.
RULE D-7 - Block of Terminal Definitions
Rule RULE D-7 defines structure of whole block of terminal definitions:
- if begins with reserved word %terminal
- followed by block of terminal definitions enclosed with curly braces { and }
An example of block declaration is this excerpt already repeated many times:
%terminal { NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown }
RULE D-8 - Block of Regular Expression Definitions
Rule RULE D-8 defines structure of a whole block of regular expression definitions:
- definition begins with reserved word
- followed by a block of regular expression definitions enclosed with curly braces { and }
Example of block declarations:
%regex { decimal-digit [0-9] number {decimal-digit}+ }
RULE D-9 - Terminal Definition
Rule RULE D-9 defines the structure of terminal definition, wether it be part of single or block definition:
- terminal name, which is in fact the value wthin an enumeration
- terminal name should be followed by text representation of terminal symbol
In example above, eight terminal symbols are defined. Six of them are specified together with its textual representation, because they have always only one value, the one specified in the declaration.
NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown
The NUMBER terminal symbol is defined without a text presentation because it has an infinite pool of different values. However, we could still give him a text presentation, e.g. the sign '0'. With such a sign, the terminal symbol could also appear in the syntax rules. However, such a method of using terminal symbols is not advised, as it may lead to ambiguity.
At this point, the difference between the number of definitions that appear in the declaration part and the number of definitions of terminal symbols and regular expressions must be added. As we can see, the declaration part contains two definitions in which eight terminal symbols and two regular expressions are defined.
RULE D-10 - Regular Expression Definition
Rule RULE D-10 defines structure of regular expression definition, wether it be part of single or block definition. It consists of:
- regular expression name
- followed by regular expression itself. Everything spanning from regular expression name to the end of line is treated as part of regular expression except space characters immediately following regular expression name
Regular expressions are composed according to the rules defined on this web page:
In example above, two regular expressions are defined:
decimal-digit [0-9] number {decimal-digit}+
As we can see, a regular expression can be defined by other regular expressions. For example: regular expression number is definied with regular expression decimal-digit. We did this by writing down the definition of the regular expression decimal-digit in the definition of the regular expression number between the curly braces. This technique can be used without any specific restrictions:
- The order of the definitions of regular expressions is not important. The regular expression used in the definition of another regular expression can be defined after its use.
- There is no limit to the number of regular expressions used. In the definition of a regular expression, you can use as many other regular expressions as necessary.
However, care should be taken to the following:
- the regular expression must not be defined by itself, neither explicitly nor implicitly, e.g. by a circular definition of regular expressions: A is defined by B, B with C and C by A.
RULE D-11 - List of Terminal Definitions
Rule RULE D-11 defines a block of terminal definitions as nonempty contiguous list of terminal definitions separated by space characters, for example:
NUMBER add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown
RULE D-12 - Terminal Definition within Block
Rule RULE D-12 defines terminal definition within list of terminal definitions. It is similar to single terminal definition. Since it is defined within block of terminals it need not be preceded by reserved word %terminal, since this word precedes block. Like single terminal definitions, block terminal definitions should also be prceded by attribute list
For example, this block definition:
%terminal { NUMBER [ Description Text='adition operator'] add '+' sub '-' mul '*' div '/' lb '(' rb ')' unknown }
and these terminal definitions:
%terminal NUMBER [ Description Text='adition operator'] %terminal add '+' %terminal sub '-' %terminal mul '*' %terminal div '/' %terminal lb '(' %terminal rb ')' %terminal unknown
are equivalent. We can also see how to specifiy attributes for single terminal definitions and for terminal definitions within block of terminal definitions.
RULE D-13 - List of Regular Expression Definitions in Block
Rule RULE D-13 defines block of regular expressions. It is a sequence of regular expression definitions separated with space symbols, like in this example:
decimal-digit [0-9] number {decimal-digit}+
RULE D-14 - Regular Expression Definition in Block
Rule RULE D-14 defines regular expression definition within block of regular expressions. It should be preceded by attribute list, like in this example:
[ Description Text='number is composed of sequence of consequtive decimal digits'] number {decimal-digit}+
Attributes
The Anglr compiler requires that the attribute list belonging to the declaration part contains the CompilationInfo attribute, like in this example:
[ CompilationInfo ClassName='MathDecls' NameSpace='Math.Declarations' Access='public']
These named values should be used in CompilationInfo attribute:
- ClassName: name of class which will be generated for particular declaration part.
- NameSpace: namespace for class which will be generated for particular declaration part.
- CodeDir: name of directory which will contain source code for class which will be generated for particular declaration part.
- Access: access specifier for class which will be generated for particular declaration part.