Anglr Language

Introduction

This paragraph presents the basic procedures for creating syntax rules that we need to know in order to understand the contents of anglr files. These files are built from functionally different parts. The contents of these parts shall be drawn up in accordance with the specific syntax rules. For every part of anglr file there is specific set of rules which describe its syntax. In this paragraph are presented syntax rules with which will be presented all aspects of anglr file parts including that part which contains syntax rules. The problem is that we have to use syntax rules to describe the syntax rules themselves.

The Anglr programming language is used to define the syntax rules that some program uses for text analysis. There are also many other things anglr files, but they are not important at this moment. With Anglr programming language one can define the syntax rules of a particular programming language such as C#, Java, etc. It can also be used to define syntax rules for any kind of text analyzers or translators used for different purposes:

to check structure of text document
to translate text files or part of them into another form
to extract interesting pieces of information from text files or text strings
to colorize text or to make graphical representations of text files with a given structure
in general, every syntax oriented transformation of structured text

Canonical form of syntax rules

Syntax rules are the most important feature of anglr files. They are contained in parser part of anglr file. They are given in a similar way to that in some other programming languages, such as BNF, Antlr, etc. There are also other parts of anglr files, but to understand them, we must first understand, how to assemble syntax rules, since all parts of anglr files are defined using syntax rules even that part which cotains syntax rules itself. Syntax rules are defned in the following way:

first, we specify the name of the syntax rule. Names of syntax rules are called non-terminal symbols and are used to form other syntax rules.
the name of the syntax rule is followed by a colon character
followed by at least one production of that rule. Productions are separated by vertical line characters. Production is sequence of terminal and nonterminal symbols. Terminal symbols represent lexical tokens of text being analyzed, nonterminal symbols represent syntax rules.
the syntax rule is completed with a semicolon character.

Example:

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

In above example:

syntax rule named additive-expression is defined. Symbols acted that way are pronounced non-terminal symbols. They are identifiers composed of alphanumerical characters and separators like minus sign and punctuation characters. If they are encircled by < and > characters, they can also contain space characters. First characer of their name must begin an alphabeical character.
it is composed by three productions
- multiplicative-expression
- additive-expression '+' multiplicative-expression
- additive-expression '-' multiplicative-expression
productions contain these terminal symbols: '+' and '-'.
two nonterminal symbols appear in all productions: multiplicative-expression (which is not defined in this example) and additive-expression itself. The rule is therefore recursively defined, which is perfectly normal.

Names of terminal and non-terminal symbols are composed on the same way. They are character strings composed of alphanumerical characters. First character must be letter, upper or lower case. Between characters should be also separators like minus sign and punctuation character. Names should also be enclosed with < and > characters. In this case space characters can also appear in names. Examples of valid names:

additive-expression
<additive expression>

Terminal names can also be represented by their character string representations. Character string representations are composed of any number of characters enclosed between single or double quotation marks. Quotation marks should also appear in character strings but they must be listed twice.

The syntax rule format is irrelevant. We can use the same format as in the example above, or we can write the whole rule in a single line, or any other form one can immagine. Everyone should use the form they see most appropriate. In other words, it is not the format that matters, but the content.

In above example we used character representation of terminal symbols. These symbols should be defined somewere else with identifiers. Suppose, tha we associated identifier add with character string '+' and identifier sub with character string '-'. With these definitions we can rewrite the above example as follows:

additive-expression
    : multiplicative-expression
    | additive-expression add multiplicative-expression
    | additive-expression sub multiplicative-expression
    ;

Syntax rules in above examples are given in canonical form.

Let's define: Canonical syntax rules are those rules the productions of which consist exclusively of terminal (together with their character string representations) and non-termial symbols.

Extended form of syntax rules

There are also other constructs, which are used to construct syntax rules of anglr fles.

cardinality operators
nested syntax rules. They will be explained later, since we don't need them to explain structure of Anglr files.

Cardinality operators

In addition to terminal and nonterminal symbols, special operators are also used to form productions of syntax rules. They are used to turn recursive rules into sequences. Formally speaking, there is no need to use operators. All productions can be created without the use of operators. Operators are used only to shorten the syntax rule notation. However, it is always better to use operators because typically syntax analysers made with the help of operators are faster than those that avoid them. Their names come from their most important property, size of the set of sentential forms derived by syntax rules associated with them, although they have other properties, too. These properties are:

cardinality of derived set of sentential forms:
- syntax rule derives set of nonempty sentential forms - positive closure
- syntax rule derives set of sentential forms with empty sentence included - Kleen closure
- syntax rule derives set of two sentential forms, one of them is empty - mandatory occurance
type of recursion to be taken into account in the syntax rule:
- extended syntax rule is expressed with canonical syntax rule using left recursion
- extended syntax rule is expressed with canonical syntax rule using right recursion
usage of separators between listed elements:
- separators are located between the elements
- separators are located around the elements

The following table lists all operators together with their properties:

Operator	Cardinality	Recursion	Separators	Usage
+	positive closure	left	between	object lists, left associative arithmetic expressions
-	positive closure	left	around	object lists, left associative arithmetic expressions
~+	positive closure	right	between	right associative arithmetic expressions
~-	positive closure	right	around	right associative arithmetic expressions
*	Kleen closure	left	between	object lists, left associative arithmetic expressions
/	Kleen closure	left	around	object lists, left associative arithmetic expressions
~*	Kleen closure	right	between	right associative arithmetic expressions
~/	Kleen closure	right	around	right associative arithmetic expressions
?	mandatory element			object lists, can be used in combination with positive closures to form Kleen closures

In other words, operators can be described as follows:

+, -, ~+ and ~- are used to form nonempty sequence of symbols. Operators + an - are related with the left associativity. They should be used to form syntax rules for arithmetical expressions which are left associative. On the other hand, operators ~+ and ~- are related with the right associativity and they should be used to form syntax rules for arithmetical expressions which are right associative. All of them should be used to form syntax rules for lists of objects, but + and - are preffered since Anglr compiler generates LR parsers. If we use these operators to define syntax rules of arithmetic expressions we must use them in accordance of associativity rules for these expressions. If we use them to define syntax rules of lists we should use that operators which are related to left associativity.
*, /, ~* and ~/ used to form any (including empty) sequence of symbols. The use of these operators is subject to similar rules to those discussed above.
? used to form nonmandatory appearance of symbol. Remember that Kleen closure can be expressed as non-mandatory positive closure: * <==> +? That's why it is sometimes more appropriate to use positive closure in combination with the operator ? instead of Kleen closure. This is because the Anglr compiler sometimes generates intermediate syntax rules when using Kleen closures, the names of which are not known to us in advance. But if we instead use the two step approach (syntax rule for positive closure folowed by non-mandatory syntax rule for previous one), we can define the names of syntax rules by ourself. This will make us clearer later, where operators using Kleen closures are described in detail.

Separators

Symbols, which are found in productions of syntax rules, are sometimes separated by separators. These separators are also symbols, mostly terminal symbols, and even most often as regular characters like commas, dots, etc. The operators listed above, however, make a difference between them. Some operators are defined so that the separators are between symbols and others so that the separators are located around the symbols. Operators + and * are defined so that the separators are between the symbols, while the operators - and / are defined so that the separators are located around the symbols. If there are no separators between the symbols, then operators + and - just like * and /, are the same. However, there is no significant difference between sentential forms derived with operators which define sequences of symbols with separators between them and sentential forms derived with operators which define sequences of symbols with separators around them. The difference is in first element and last element of these sequences (sentential forms). If it is separator, then the sentential form is sequence of symbols with separators around them, otherwise is a sequence of symbols with separators between them. In fact, operators - and / invert the role of symbols and separators in operators + and *.

Operators without separators

First will be defined operators which define sequences of symbols without separators and later will be defined the same operators which define squences of symbols with separators. All operators will be defined with simple example and the rule which translates this example into canonical form. Note that these definitions are not entirely accurate. They are basically a subset of exact definitions, but wide enough to use in this paragraph. Exact definitions of cardinality operators are specified in the paragraph about parser part of anglr file.

Operator +

Operator + is defined on the following way. Syntax rule using operator +:

A
    : B +
    ;

is equivalent to the following canonical form:

A
    : B
    | A B
    ;

The set of sentential forms derived by operator + is equal to {B, BB, BBB, ...}

Operator ~+

This operator is same as operator + except that it is defined with right recursion. Syntax rule using operator ~+:

A
    : B ~+
    ;

is equivalent to the following canonical form:

A
    : B
    | B A
    ;

The set of sentential forms derived by operator ~+ is equal to the set of sentential forms derived by operator + namely {B, BB, BBB, ...}

Operator -

Operator - without separators, is defined in the same way as operator +. Syntax rule using operator -:

A
    : B -
    ;

is equivalent to the following canonical form:

A
    : B
    | A B
    ;

The set of sentential forms derived by operator - is equal to {B, BB, BBB, ...}

Operator ~-

This operator is same as operator - except that it is defined with right recursion. Syntax rule using operator ~-:

A
    : B ~-
    ;

is equivalent to the following canonical form:

A
    : B
    | B A
    ;

The set of sentential forms derived by operator ~- is equal to the set of sentential forms derived by operator - (or +) namely {B, BB, BBB, ...}

Operator *

Operator * is definied similarly to operator +. Syntax rule using operator *:

A
    : B *
    ;

is equivalent to the following canonical form:

A
    : %empty
    | A B
    ;

The set of sentential forms derived by operator * is equal to {e, B, BB, BBB, ...}. It is equal to the union of set of sentential forms derived by operator + and an empty string e.

Note that operator * is equivalent to operator ? applied to operator +: B * <==> B + ?

Operator ~*

Operator ~* is definied similarly to operator *, except that is defined with rigt recursion. Syntax rule using operator ~*:

A
    : B ~*
    ;

is equivalent to the following canonical form:

A
    : %empty
    | B A
    ;

The set of sentential forms derived by operator ~* is equal the set of sentential form derived by operator * namely {e, B, BB, BBB, ...}. It is equal to the union of set of sentential forms derived by operator + and an empty string e.

Note that operator ~* is equivalent to operator ? applied to operator ~+: B ~* <==> B ~+ ?

Operator /

Operator / without separators, is definied in the same way as operator *. Syntax rule using operator /:

A
    : B /
    ;

is equivalent to the following canonical form:

A
    : %empty
    | A B
    ;

The set of sentential forms derived by operator / is equal to {e, B, BB, BBB, ...}. It is equal to the union of set of sentential forms derived by operator + and an empty string e.

Operator ~/

Operator ~/ without separators, is definied in the same way as operator ~*. Syntax rule using operator ~/:

A
    : B ~/
    ;

is equivalent to the following canonical form:

A
    : %empty
    | B A
    ;

The set of sentential forms derived by operator / is equal to {e, B, BB, BBB, ...}. It is equal to the union of set of sentential forms derived by operator + and an empty string e.

Operator ?

Operator ? is definied on the following way. Syntax rule using operator ?:

A
    : B ?
    ;

is equivalent to the following canonical form:

A
    : %empty
    | B
    ;

The set of sentential forms, derived by operator ?, is equal to {e, B}.

Separators

Symbols that appear in sequences are often separated by other symbols. We call them separators. Separators are used by specifying them in square brackets immediately following operators. They should be specified as arbitrary syntax rules, even syntax rules specified with operators. They should be nested to arbitrary depth - syntax rules used as separators shold use separators as well.

Using operators and separators, the example above can be rewritten as follows::

additive-expression
    : multiplicative-expression + [ '+' ]
    | multiplicative-expression + [ '-' ]
    ;

Do not replace + with '+'. The first is the operator, while the second is a terminal symbol. The same is true for other operators -, *, / and ?.

The last example can be rewriten in even shorter form:

additive-expression
    : multiplicative-expression + [ '+' | '-' ]
    ;

The use of separators changes the definitions of operators. The operator definition of operator + is not significantly changed.

Separators used with operator +

Operator + using separator defines sequence of symbols with separators between them. Syntax rule using operator + with separator is definied on the following way. Syntax rule using operator + with separator S:

A
    : B + [ S ]
    ;

is equivalent to the following canonical form:

A
    : B
    | A S B
    ;

Set of sentential forms derived by operator + with separato is equal to {B, BSB, BSBSB, BSBSBSB, ...}

Separators used with operator ~+

Operator ~+ using separator defines sequence of symbols with separators between them. Its canonical form is similar to canonical form of operator +. The difference is in the type of recursion used to define canonical form. It uses right handed recursion. Syntax rule using operator ~+ with separator S:

A
    : B ~+ [ S ]
    ;

is equivalent to the following canonical form:

A
    : B
    | B S A
    ;

Set of sentential forms derived by operator ~+ with separator is equal the set of sentential forms derived by operator + namely {B, BSB, BSBSB, BSBSBSB, ...}

Separators used with operator -

Operator - using separator, defines sequence of symbols with separators around them. In fact, it excahnges the roles of symbols and separators in definition of operator + with minor difference: first member of the sequence must not be taken into account. First member is S, second SBS, and so on. Sequence must thus be started with member SBS. So let's define: Syntax rule using operator - with separator S:

A
    : B - [ S ]
    ;

is equivalent to the following canonical form:

A
    : S B S
    | A B S
    ;

Set of sentential forms derived by operator - with separato is equal to {SBS, SBSBS, BSBSBS, BSBSBSBS, ...} and is equal to the set of sentential forms derived by S + [B] without the first member S.

Separators used with operator ~-

Operator ~- using separator, defines sequence of symbols with separators around them. Its canonical form is similar to canonical form of operator -, but writen in right recursive form. Syntax rule using operator ~- with separator S:

A
    : B ~- [ S ]
    ;

is equivalent to the following canonical form:

A
    : S B S
    | S B A
    ;

Set of sentential forms derived by operator ~- with separator is equal to the set of sentential forms derived by operator - namely {SBS, SBSBS, BSBSBS, BSBSBSBS, ...}

Separators used with operator *

Syntax rule using operator * with separator is not definied similarly to operator + with separator, since this would produce intuitively unexpected results.

If the operator * were defined in a similar way to the operator +, its definition:

A
    : B * [ S ]
    ;

would be equivalent to the following canonical form:

A
    : %empty
    | A S B
    ;

Since this syntax rule derives the same set of sentential forms {e, SB, SBSB, SBSBSB, ...} as the following rule:

A
    : (S B) *
    ;

this is not exactly acceptable. Since B* is in fact equivalent to B+?, we would expect that B*[S] is equivalent to (B+[S])?

So let us define: Syntax rule using operator * with separator is definied on the following way:

A
    : B * [ S ]
    ;

is equivalent to syntax rule:

A
    : (B + [ S ]) ?
    ;

and to this canonical form:

A
    : %empty
    | C
    ;
C
    | B
    | C S B
    ;

which derives the same set of sentential forms {e, B, BSB, BSBSB, BSBSBSB, ...} as the union of sentential forms derived by operator + with separator:

A
    : B + [ S ]
    ;

and an empty string.

Note that the above canonical form cannot be specified in the following way:

A
    : %empty
    | B
    | A S B
    ;

since the set of sentential forms generated by this form is the superset of previous one. It contains all elements from the previous set and also elements which begin with separator S.

Separators used with operator ~*

Syntax rule using operator ~* with separator is definied similarly to operator * with separator. The difference is the use of recursive rule in its canonical form. Syntax rule using operator ~* with separator is definied on the following way:

A
    : B ~* [ S ]
    ;

is equivalent to syntax rule:

A
    : (B ~+ [ S ]) ?
    ;

and to this canonical form:

A
    : %empty
    | C
    ;
C
    | B
    | B S A
    ;

which derives the same set of sentential forms as the set of sentential forms derived by operator *, namely {e, B, BSB, BSBSB, BSBSBSB, ...} :

Separators used with operator /

Syntax rule using operator / with separator is definied, similarly to operator * with separator, on the following way:

A
    : B / [ S ]
    ;

is equivalent to syntax rule:

A
    : (B - [ S ]) ?
    ;

and to this canonical form:

A
    : %empty
    | C
    ;
C
    | S B S
    | C B S
    ;

which derives the same set of sentential forms {e, SBS, SBSBS, SBSBSBS, SBSBSBSBS, ...} as the union of sentential forms derived by operator - with separator:

A
    : B - [ S ]
    ;

and an empty string.

Again, above canonical form cannot be expressed on this way:

A
    : %empty
    | S
    | A B S
    ;

Separators used with operator ~/

Syntax rule using operator ~/ with separator is definied similarly to operator / with separator. The difference is in the type of recursion used in its canonical form. Syntax rule for operator ~/ :

A
    : B ~/ [ S ]
    ;

is equivalent to syntax rule:

A
    : (B ~- [ S ]) ?
    ;

and to this canonical form:

A
    : %empty
    | C
    ;
C
    | S B S
    | S B C
    ;

which derives the same set of sentential forms {e, SBS, SBSBS, SBSBSBS, SBSBSBSBS, ...} as the set of sentential forms derived by operator / with separator

Separators used with operator ?

Operator ? does not use separator since there is nothing to be separated with, since sentential forms associated with this operator contain at least one symbol.

Don't be too "creative", when you define syntax rules for sequences of syntax-identical objects. Use cardinality operators instead. This will ensure that syntax rules (their canonical forms) will be always composed in the same way, which is provided by Anglr compiler.

Syntax rules and sentential forms

Now is a good time to delve deeper into understanding the composition of sentential forms set derived by syntax rules composed with the help of operators.

Let's take another look at the examples defined here and here, where syntax rule additive-expression is defined in two ways. At first glance, which is otherwise a false assumption, it seem that these definitions are not equivalent, since they derive different sets of sentential forms. The set of sentential forms derived by first example is supposed to be the union of two sets. First of them consists of multiplicative-expressions connected with terminal symbol '+', the second one consists of multiplicative-expressions connected with terminal symbol '-'. On the other side, the set of sentential forms derived by second example derives the set of multiplicative-expressions connected with the mix of both terminal symbols. These sets are clearly different.

But that's not the case. When analyzing language or set of sentential forms of that language derived by arbitrary number of syntax rules of that language, we should never work with syntax rules with operators. We must first derive syntax rules with producions containing only terminal and non-terminal symbols and then perform the analyzes, like in the example below. Definition of syntax rule additive-expression in first example should be writen in this form:

additive-expression
    : multiplicative-expression + [ '+' ]
    ;
additive-expression
    : multiplicative-expression + [ '-' ]
    ;

From above records, we can incorrectly conclude that we cannot use arithmetic operators '+' and '-' in the same arithmetic expression. But this is incorrect assumption. If we follow the definition of operator +, we find these syntax rules for additive-expression.

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    ;
additive-expression
    : multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

If we ignore duplcated definitions, we have essentially a rewrite of original definition. But we now see clearly, that we can use both arithmetic operators in the same arthmetic operation, since we can alternately use productions of additive-expression in the derivation of sentential forms for that syntax rule. For example, these derivations are quite legal for syntax rule additive-expression:

additive-expression
    --> additive-expression '-' multiplicative-expression
    --> additive-expression '+' multiplicative-expression '-' multiplicative-expression
    --> multiplicative-expression '+' multiplicative-expression '-' multiplicative-expression

Following the definition of operator + again, the second definition of syntax rule additive-expression, can be writen in this form:

additive-expression
    : multiplicative-expression
    | additive-expression ( '+' | '-' ) multiplicative-expression
    ;

which is nothing but shorter form of original definition.

Since in both cases we get the same rules, we conclude, that first and second rules composed with operators are equivalent and that the first rule derives the same set of sentential forms as the second one.

This example provides a good insight into understanding of the relation between syntax rules composed with operators and the composition of the sentential forms set derived by these rules. This knowledge can be useful in assembling syntax rules. The syntax rules should be as short as possible and consist of as few productions as possible. The length of the syntax rule, and in particular the number of productions, affects the speed of the syntax analyzer. The less the productions, the faster the analyzer.

Example

Below is presented complete example of anglr file.

[ Description Text='definitions of tokens and regular expressions used to define sntax']
[ Description Text='of simple arithmetic expressions']
[ CompilationInfo ClassName='MathDecls' NameSpace='Math.Declarations' Access='public']
%declarations mathDecls
%{
    %regex
    {
        decimal-digit [0-9]
        number {decimal-digit}+
    }

    %terminal
    {
        NUMBER
        add '+'
        sub '-'
        mul '*'
        div '/'
        lb '('
        rb ')'
        unknown
    }
%}

[ Description Text='definition of scanner, which extracts comments from input string']
[ Declarations Id='mathDecls' ]
[ CompilationInfo ClassName='CommentRegex' NameSpace='Math.ScannerLib' Access='public']
%scanner commentScanner
%{
[\*]+\/
    pop
[\n\r]
    skip
[^\*]+
    skip
[\*]+
    skip
%}

[ Description Text='definition of scanner, which extracts terminal symbols from input string']
[ Declarations Id='mathDecls' ]
[ CompilationInfo ClassName='MathRegex' NameSpace='Math.ScannerLib' Access='public']
%scanner mathScanner
%{
\/\*
    push commentScanner
{number}
    terminal NUMBER
\+
    terminal add
\-
    terminal sub
\*
    terminal mul
\/
    terminal div
\(
    terminal lb
\)
    terminal rb
[ \t]+
    skip
[\n\r]
    skip
.
    skip
%}

[ Description Text='Lexer for anglr file' Hover='true' ]
[
    UseScanner
        ScannerId='commentScanner'
        InitialScanner='mathScanner'
        Hover='true'
]
[ CompilationInfo ClassName='MathLexer' NameSpace='Math.Lexer' Access='public' Hover='true' ]
%lexer mathLexer
%{

%}

[ Description Text='definition of parser for simple arithmetic expressions']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser1
%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression '*' unary-expression
    | multiplicative-expression '/' unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;


%}

Above example contains some of the gripes that are worth remembering:

Operator Priority
	Rules are designed to take into account the priority of operators: addition and subtraction have the same priority, but lower than multiplication and division, which also have the same mutual priority, but again lower than the nested arithmetic expressions, which are located between parentheses. This technique is worth remembering because Anglr does not have special approaches to introduce a priority among operators used in programming languages.
Recursive Rules
	In the example above, some syntax rules can be seen to be recursive. These rules are additive-expression and multiplicative-expression. These two rules are explicitly recursive, as their names directly appear in the terms with which they are defined. Implicitly, all the rules in this case are recursive, since each of them acts in themselves indirectly through other rules.
Left recursive rules
	The rules additive-expression and multiplicative-expression are left recursive because their names appeare on the left side of the rules, they are defined with. If it is possible, let all the rules in Anglr be left recursive. This affects the functioning of the syntax analyser generated by the Anglr compiler. Anglr compiler generates so-called LR analyzers. They are very effective at addressing left-wing recursive rules, but in the right recursive rules, results can be disastrous. Recursive rules typically define sequences of syntacticaly identical elements. If the syntax rules for these sequences are left recursive, the generated syntax analyzer can handle them in real time, one by one. However, if the syntax rules for these sequences are right recursive, the syntax analyzer must remember all elements in the sequence and only deal with them at the end. If this sequence is very long, the syntax analyzer may run out of memory.
Terminal Symbols
	In the above example, there are things that are not explicitly defined anywhere. These are: '+', '-', '', '/', '(', ')', and NUMBER. These are terminal symbols. As we will see later, these elements must be defined somewhere. For '+', '-', '', '/', '(', and ')' this is not necessarily necessary because we can see from the record itself what their content is. However, the contents of the NUMBER terminal symbol are not known, so it must be defined elsewhere. At this point it should be noted that the arithmetic expressions defined by the syntax rules defined above, consist exclusively of the terminal symbols fund in these syntax rules. It is a general rule: each language (or sentences in this language) consists only of those terminal symbols which are located in the syntax rules by which it is defined.
Start Rule
	And here's another thing that we see in the example above. The 'expression' syntax rule is indicated by the [ Start ] attribute. This means that expression is the rule that defines all valid arithmetic expressions that are build up according to the rules in the example. This is the general rule: all valid statements of some language are derived by start rule. For example, the following terms are valid under this rule: 1 1 + 2 1 * 2 + 3 Let's imagine that in the above exampe, unary-axpression would be chosen as the starting rule. Then 1 + 2 and 1* 2 + 3 would no longer be a valid arithmetic expressions. However, the same expressions would be valid if they were listed in parentheses: (1 + 2) in (1 * 2 + 3). It is therefore worth remembering that the starting rule derives the language: a set of valid sentences from that language.

[ Description Text='compact definition of parser for simple arithmetic expressions']
[ Description Text='using iterators']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser2
%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression + [ '+' ]
    | multiplicative-expression + [ '-' ]
    ;

multiplicative-expression
    : unary-expression + [ '*' ]
    | unary-expression + [ '/' ]
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}

[ Description Text='more compact definition of parser for simple']
[ Description Text='arithmetic expressions using iterators']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser3
%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression + [ '+' | '-' ]
    ;

multiplicative-expression
    : unary-expression + [ '*' | '/' ]
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;
%}

[ Description Text='compact definition of parser for simple arithmetic expressions']
[ Description Text='using iterators and nested syntax rules']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser4
%{

[ Start ]
expression
    : ( : additive-expression :
            ( : multiplicative-expression :
                ( : unary-expression :
                    NUMBER
                | '(' expression ')'
                ) + [ '*' | '/' ]
            ) + [ '+' | '-' ]
        )
    ;

%}

[ Description Text='compact definition of parser for simple arithmetic expressions']
[ Description Text='using iterators and anonymous nested syntax rules']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser5
%{

[ Start ]
expression
    : (
            (
                (
                    NUMBER
                | '(' expression ')'
                ) + [ '*' | '/' ]
            ) + [ '+' | '-' ]
        )
    ;

%}

[ Description Text='single line compact definition of parser for simple arithmetic']
[ Description Text='expressions using iterators and anonymous nested syntax rules']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser6
%{

[ Start ]
expression : ((( NUMBER | '(' expression ')' ) + [ '*' | '/' ] ) + [ '+' | '-' ] );

%}

Conclusion

Now that we know the basic procedures for creating syntax rules, we can learn more precisely about how to write the contents of anglr file, so that we can successfuly create a working syntax analyzer with its help.

Anglr Language

Introduction

Canonical form of syntax rules

Extended form of syntax rules

Cardinality operators

Separators

Operators without separators

Operator +

Operator ~+

Operator -

Operator ~-

Operator *

Operator ~*

Operator /

Operator ~/

Operator ?

Separators

Separators used with operator +

Separators used with operator ~+

Separators used with operator -

Separators used with operator ~-

Separators used with operator *

Separators used with operator ~*

Separators used with operator /

Separators used with operator ~/

Separators used with operator ?

Syntax rules and sentential forms

Example

Operator Priority

Recursive Rules

Left recursive rules

Terminal Symbols

Start Rule

Conclusion