Parser Part of Anglr File

Introduction

In this section, we specify the syntax rules for the syntax analyzer that we want to produce. The text that we will analyze with this syntax analyzer must comply with these rules. The descriptions in this section are at first glance a little confusing because things are describing themselves. However, they will also be shown with examples that will help you to understand content more easily.

Syntax

The following are syntax rules that are used to create syntax rules for the syntax analyzer.

RULE P-1

<parser part>
    : <attribute list> ? '%parser' <identifier> '%{' <anglr syntax rule list> ? '%}'
    ;

RULE P-2

<anglr syntax rule list>
    : <anglr syntax rule> +
    ;

RULE P-3

<anglr syntax rule>
    : <attribute list> ? <identifier> ':' <anglr syntax production list> ';'
    | <attribute list> ? <identifier> '{' <anglr syntax rule list> ? '}'
    ;

RULE P-4

<anglr nested rule>
    : <anglr syntax production list name> ? <anglr syntax production list>
    ;

RULE P-5

<anglr syntax production list name>
    : ':' <identifier> ':'
    ;

RULE P-6

<anglr syntax production list>
    : <anglr syntax production> + [ '|' ]
    ;

RULE P-7

<anglr syntax production>
    : <production name> ? <name list>
    | '%empty'
    ;

RULE P-8

<production name>
    : '@@' <identifier>
    ;

RULE P-9

<name list>
    : <g name> - [ <marker list> ? ]
    ;

RULE P-10

<marker list>
    : <marker> +
    ;

RULE P-11

<marker>
    : '@' <identifier>
    ;

RULE P-12

<g name>
    : <name>
    | '(' <anglr nested rule> ')'
    | <g name> <cardinality delimiter>
    ;

RULE P-13

<name>
    : <any>
    | <cstring>
    | <identifier>
    ;

RULE P-14

<cardinality delimiter>
    : <cardinality> <delimiter> ?
    ;

RULE P-15

<cardinality>
    : '?'
    | '+'
    | '-'
    | '*'
    | '/'
    | '~+'
    | '~-'
    | '~*'
    | '~/'
    | '{' <number> ? ',' <number> ? '}'
    ;

RULE P-16

<delimiter>
    : '[' <anglr nested rule> ']'
    ;

Introductory Examples

We will use these examples in many places where it will be necessary to clarify the syntax rules that we need to draw up syntax rules. Examples consist of several parser parts. All parts define equivalent syntax rules since they derive the same languages: a set of simple arithmetical expressions. These examples give us insight into almost all expressive possibilities of ANGLR language. The syntax rules are built so that each subsequent one is just a more compact version of the previous one. In this way, we gradually come from the initial, simple rules, written widely, to a more compact form that can be written in a single line.

Example 1:

[ Description Text='definition of parser for simple arithmetic expressions']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser1
%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression '*' unary-expression
    | multiplicative-expression '/' unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}

Example 2:

[ Description Text='compact definition of parser for simple arithmetic expressions']
[ Description Text='using iterators']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser2
%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression + [ '+' ]
    | multiplicative-expression + [ '-' ]
    ;

multiplicative-expression
    : unary-expression + [ '*' ]
    | unary-expression + [ '/' ]
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}

Example 3:

[ Description Text='more compact definition of parser for simple']
[ Description Text='arithmetic expressions using iterators']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser3
%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression + [ '+' | '-' ]
    ;

multiplicative-expression
    : unary-expression + [ '*' | '/' ]
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;
%}

Example 4:

[ Description Text='compact definition of parser for simple arithmetic expressions']
[ Description Text='using iterators and nested syntax rules']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser4
%{

[ Start ]
expression
    : ( : additive-expression :
            ( : multiplicative-expression :
                ( : unary-expression :
                    NUMBER
                | '(' expression ')'
                ) + [ '*' | '/' ]
            ) + [ '+' | '-' ]
        )
    ;

%}

Example 5:

[ Description Text='compact definition of parser for simple arithmetic expressions']
[ Description Text='using iterators and anonymous nested syntax rules']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser5
%{

[ Start ]
expression
    : (
            (
                (
                    NUMBER
                | '(' expression ')'
                ) + [ '*' | '/' ]
            ) + [ '+' | '-' ]
        )
    ;

%}

Example 6:

[ Description Text='single line compact definition of parser for simple arithmetic']
[ Description Text='expressions using iterators and anonymous nested syntax rules.']
[ Description Text='Syntax rules are nested to the depth of two. Every nested rule']
[ Description Text='is iterated with single iterator']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public' CodeDir='mathParser6' ]
%parser mathParser6
%{

[ Start ]
expression : (( NUMBER | '(' expression ')' ) + [ '*' | '/' ] ) + [ '+' | '-' ] ;

%}

Example 7:

[ Description Text='single line compact definition of parser for simple arithmetic']
[ Description Text='expressions using iterators and single anonymous nested syntax']
[ Description Text='rule, which is iterated twice.']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public' CodeDir='mathParser7' ]
%parser mathParser7
%{

[ Start ]
expression : ( ? NUMBER | '(' expression ')' ) + [ '*' | '/' ] + [ '+' | '-' ] ;

%}

Discussion

RULE P-1 - Structure of Parser Part

Rule RULE P-1 defines the top structure of the the parser part. This part is composed in a similar way to all other parts of anglr file:

the part is preceded by possibly empty attribute list
followed by reserved word %parser and an identifier representing parser part name
between part braces %{ and %} is located content of the parser part - list of syntax rules, whic can also be empty.

Following this description, we can observe that the structure of the parser part from the example above can be laid down in the following way:

parser part is preceded with this attribute list:

[ Description Text='definition of parser for simple arithmetic expressions']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']

its name is:

%parser mathParser1

part's content is equal to:

%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression '*' unary-expression
    | multiplicative-expression '/' unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}

RULE P-2 - List of Syntax Rules

Rule RULE P-2 defines the list of syntax rules: it is a nonempty sequence of syntax rules with no special separators between them. Everything between part delimiters %{ and %} is a list of syntax rules. We need to understand this in a broader sense because there are comments between the rules and rules can be grouped as we will see in the discussion of the next rule. A simple example of a syntax rule list is specified in the discussion of the previous rule:

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression '*' unary-expression
    | multiplicative-expression '/' unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

RULE P-3 - Syntax Rule and Block of Syntax Rules

Rule RULE P-3 defines two kinds of syntax rules:

the first production
```
    <attribute list> ? <identifier> ':' <anglr syntax production list> ';'
                            
```
defines the outer structure of the syntax rule
- it is preceded by possibly an empty attribute list
- next comes the syntax rule name
- between colon and semicolon characters is a list of productions for a particular syntax rule
<identifier> representing syntax rule name is actually the definition of a non-terminal symbol. Non-terminal symbols cannot be declared another way, although they should be defined multiple times, specifying every production of syntax rule with a set of equally named syntax rules. For example:
```
additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;
                            
```
- syntax rule has no attribute list
- name of syntax rule is additive-expression
- it has the following list of productions:
```
      multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
                                    
```
  note that leading colon and trailing semicolon characters does not belong to the list of productions
Syntax rule additive-expression should also be written in the following way:
```
additive-expression
    : multiplicative-expression
    ;
additive-expression
    : additive-expression '+' multiplicative-expression
    ;
additive-expression
    : additive-expression '-' multiplicative-expression
    ;
                            
```
In general: every definition of syntax rule should be spread across the anglr file (in fact: across the single parser part)
The second production
```
    <attribute list> ? <identifier> '{' <anglr syntax rule list> ? '}'
                            
```
is something completely different: it is the definition of group of syntax rules. Structure of syntax rule group is very similar to structure of parser part itself:
- it is preceded by possibly empty attribute list
- it has name although without keyword %parser
- content of group is equal to that of parser part: possibly empty list of syntax rules (and other, nested groups). The difference is in group braces which are { and }

Syntax rule groups can be introduced very easily. One can just select any number of syntax rules, enclose them with curly braces preceded by group name and possibly empty attribute list. For example. Let's take parser part above. Selecting rules additive-expression and multiplicative-expression, enclosing them with curly braces and giving them a name, we get new group as shown below:

%{

[ Start ]
expression
    : additive-expression
    ;

Expressions
{
additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression '*' unary-expression
    | multiplicative-expression '/' unary-expression
    ;
}

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}

It should be noted here, that with introduction of groups we don't change the sysntax of the language. We just introduce another level of organization of anglr file. Since groups could be nested we introduce a hierarchicaly organized syntax rules in the anglr file.

RULE P-4 - Nested Syntax Rule

Rule RULE P-4 introduces the concept of nested syntax rule. The syntax rules that we have seen in previous examples are so-called standalone syntax rules. Nested rules are very similar to standalone rules, except that they are always defined inside other rules. They have slightly different syntax:

they need not to have a name. Name is mandatory in standalone rules.
they hay no leading ':' and trailing ';'. These characters are mandatory in standalone rules.

But they have the same body composed of productions separated by '|' character. More about nested syntax rules will be told later in the discussion of syntax rule structure and cardinality operators. Using nested rules, parser part above can be writen in this way:

%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression ( '+' | '-' ) multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression ( '*' | '/' ) unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}

In the above example we introduced two nested rules:

'+' | '-'
'*' | '/'

Rounded parentheses ( and ) are operators which introduce nested syntax rule in the body of syntax rule production.

RULE P-5 - Nested Syntax Rule Name

Rule RULE P-5 defines the name of nested syntax rule. It is an identifier enclosed between two ':' characters.

Nested rules from the previous example are unnamed. In the simplest case, when nested syntax rule consists of only one production and when this production consists of one symbol (terminal or non-terminal), this is not a problem. In more complex cases like those showed above, anglr compiler will generate names for those nested rules. If we will reference source code generated by anglr compiler for these productions, it is always better to give te name to the nested syntax rules. With named nested rules, the example above, should be rewriten in this way:

%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression ( : add-operators : '+' | '-' ) multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression ( : mul-operators : '*' | '/' ) unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}

We have now introduced two named nested rules:

: add-operators : '+' | '-'
: mul-operators : '*' | '/'

They are equivalent to these standalone rules:

add-operators
    : '+'
    | '-'
    ;

mul-operators
    : '*'
    | '/'
    ;

This is always the case: named nested rules can be easily transformed in standalone form and vice-versa. Using standalone rules, example above can be rewriten in this way:

%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression add-operators multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression mul-operators unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

add-operators
    : '+'
    | '-'
    ;

mul-operators
    : '*'
    | '/'
    ;

%}

RULE P-6 - List of Produtions

Rule RULE P-6 defines the structure of syntax rule body, the list of syntax rule productions. It is a nonempty sequence of productions, delimited by '|' character. Given any syntax rule, we can very easily extract production list from it. Just eliminate syntax rule name and leading : and trailing ; and the remainder is list of syntax rule productions. If we again look at this example, we can observe these production lists:

expression has production list composed of only one production:

      additive-expression

additive-expression has production list with three productions:

      multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression

multiplicative-expression has also three productions:

      unary-expression
    | multiplicative-expression '*' unary-expression
    | multiplicative-expression '/' unary-expression

unary-expression has production list with two productions

      NUMBER
    | '(' expression ')'

It should be noted here, that format of syntax rule is not important. Specifically: it is not necessary to write each production of syntax rule in its own line. It is perfectly well to write the whole syntax rule in single line or whatever form one like. For example: syntax rules for the example above should be writen in the following way:

[ Start ] expression : additive-expression ;
additive-expression : multiplicative-expression | additive-expression '+' multiplicative-expression | additive-expression '-' multiplicative-expression ;
multiplicative-expression : unary-expression | multiplicative-expression '*' unary-expression | multiplicative-expression '/' unary-expression ;
unary-expression : NUMBER | '(' expression ')' ;

RULE P-7 - Structure of Production

Rule RULE P-7 defines the overall structure of syntax rule production. There are two kinds of productions

Nonempty production, defined with first production of syntax rule RULE P-7:
```
    <production name> ? <name list>
                            
```
Nonempty production is composed on the following way:
- it can be named, althoug not mandatory.
- name (or nothing if production has no name) is followed by non-empty list of mostly terminal and non-terminal symbols, but there can also be other elements in this list. Nature of elements in this list is explained later.
and empty production, defined with second production of syntax rule RULE P-7:
```
    '%empty'
                            
```
Keyword %empty can be used only in this context and labels production which always derives an empty string.

Syntax rules, which contain empty productions, can derive empty trings. That's why also non-empty productions can derive empty strings if they are composed by non-terminal symbols which represent syntax rules which derive empty strings.

If we again look at the example above, we can see that syntax rules of this example contain these productions:

expression has only one production:
- additive-expression
additive-expression has three productions:
- multiplicative-expression
- additive-expression '+' multiplicative-expression
- additive-expression '-' multiplicative-expression
multiplicative-expression has also three productions:
- unary-expression
- multiplicative-expression '*' unary-expression
- multiplicative-expression '/' unary-expression
unary-expression has two productions:
- NUMBER
- '(' expression ')'

These productions are very simple, since they are composed solely of terminal and non-terminal symbols. But we will see later in the discussion of parser part, that there are also other consituent elements of productions, which are established with the introduction of nested rules and cardinality operators. However, anglr compiler always maps these elements in basic elements - terminal and non-terminal symbols.

RULE P-8 - Production Name

Rule RULE P-8 defines the structure of production name. It is an identifier preceded by double at sign '@'.

We can give names to some productions in this example and we get:

%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | @@addition additive-expression '+' multiplicative-expression
    | @@subtraction additive-expression '-' multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | @@multiplication multiplicative-expression '*' unary-expression
    | @@division multiplicative-expression '/' unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}

Production names does not change the syntax of the rule. Production names must vary within particular syntax rule, but they can be repeated in other rules.

Following is an example of illegal use of production names, because productions within the same syntax rule should not have same names:

additive-expression
    : multiplicative-expression
    | @@arithmetic-expression additive-expression '+' multiplicative-expression
    | @@arithmetic-expression additive-expression '-' multiplicative-expression
    ;

Also this example has illegal use of production names, because productions within the same syntax rule should not have same names, regardless that these rules are specified with two distinct expressions:

additive-expression
    : multiplicative-expression
    | @@arithmetic-expression additive-expression '+' multiplicative-expression
    ;
additive-expression
    : multiplicative-expression
    | @@arithmetic-expression additive-expression '-' multiplicative-expression
    ;

But this example is perfectly OK, since productions with same names are used in different syntax rules:

additive-expression
    : multiplicative-expression
    | @@name-one additive-expression '+' multiplicative-expression
    | @@name-two additive-expression '-' multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | @@name-one multiplicative-expression '*' unary-expression
    | @@name-two multiplicative-expression '/' unary-expression
    ;

RULE P-9 - List of Production Building Blocks

Rule RULE P-9 defines the structure of syntax rule production <name list>. It represents a production without its name, of course, if that name even exists.It is nonempty sequence of basic building blocks surrounded with (possibly empty) list of positional markers. In the most simple case these building blocks are terminal (together with textual representation) and non-terminal symbols. In more advanced scenarios these building blocks should also be nested syntax rules or these same basic building blocks with cardinality operators. Before anGLR compiler translates these rules, it converts them to a basic format that contains only terminal and non-terminal symbols. Nested rules are replaced by non-terminal symbols (nested rule names), cardinality operators shall be converted according to rules specifying how they are represented in the basic form containig only terminal and nonterminal symbols.

Here is an example of production with very simple structure:

    @@addition additive-expression '+' multiplicative-expression

It contains only terminal and non-terminal symbols (note that production name @@addition is not part of <name list>):

non-terminal symbol additive-expression
character string representation of terminal symbol '+'
and another non-terminal symbol multiplicative-expression

And here is another example of production building block list, which contains also positional markers:

    @@field valuefieldreference-token Type UNIQUE ? @push-governor ValueOptionalitySpec ? @pop-governor

It contains these building blocks:

nontermial symbol valuefieldreference-token
nonterminal symbol Type
cardinality expression UNIQUE ?
and another cardinality expression ValueOptionalitySpec ?

It contains following positional markers:

@push-governor
@pop-governor

Note that positional markers does not change the meaning of syntax rule production.

This production is more advanced:

    additive-expression ( : add-operators : '+' | '-' ) multiplicative-expression

It is also composed of three basic building blocks:

non-terminal symbol additive-expression
nested syntax rule : add-operators : '+' | '-' enclosed within rounded braces ( and ).
and another non-terminal symbol multiplicative-expression

Here is an example of even more advanced production:

    (( : number : NUMBER | '+' number | '-' number ) | '(' expression ')' ) + [ '<<' | '>>' ] + [ '*' | '/' ] + [ '+' | '-' ]

It is composed of only one building block: series of three cardinality operators applied to nested syntax rule (which is also composed of nested rule and simple prodution):

    ( : number : NUMBER | '+' number | '-' number ) | '(' expression ')'

It is an interesting example of how to introduce arithmetic operators priority:

shift operators << and >> have highest priority
multiplication and division operators ( * and / respectively) have lower priority
and finally addition and subtraction ( + and - respectively) have the lowest priority

RULE P-10 - List of Positional Markers

Rule RULE P-10 defines list of positional markers

RULE P-11 - Positional Marker

Rule RULE P-11 defines positional marker within syntax rule production. They are zero based indexes of basic building blocks of production. Their names must vary within particular production, but may be repeated in other productions also in the productions within the same syntax rule. Production in which the marker appears must be named.

Here is an example of positional markers usage:

VariableTypeValueFieldSpec
    : @@field valuefieldreference-token FieldName @push-governor ValueOptionalitySpec ? @pop-governor
    ;

In the example above two positional markers are defined:

positional marker @push-governor surrounds FieldName from the right side and ValueOptionalitySpec ? from the left.
positional marker @pop-governor surrounds ValueOptionalitySpec ? from the right.

From above example we can see that it is not easy to see which building block is surronded by which positional marker. This is always about interpretation. Whoever introduces markers knows what they're surrounding and why.

Positional markers have no special meaning in syntax rules. The syntax rule is always the same, no matter how many positional markers it contains. The markers are connected to the generated code. The programmer introduces them to better address the semantics of that part of the source code governed by the particular syntax rule.

RULE P-12 - Kinds of Production Building Blocks

Rule RULE P-12 defines different kinds of 'generalized' name - basic building block of production. It can be one of these:

<name>, like an identifier (representing terminal and non-terminal symbols) or character string (representing textual representations of terminal symbols).
'(' <anglr nested rule> ')', nested syntax rule enclosed with round brackets ( and ).
<g name> <cardinality delimiter>, cardinality expressions: anything above appended with unlimited number of cardinality operators.

Note: <name>s are sole building blocks used in canonical forms, other building blocks (nested rules and cardinality expressions) are used to construct extended forms of syntax rules.

To identify building blocks of production, follow these steps:

first identify end ignore all positional markers and produciton name
then identify all outermost nested syntax rules. To identify it, follow this procedure:
- start counting with count = 0
- whenever you find opend round bracket, increment count: count = count + 1
- whenever you find closed round bracket, decrement count: count = count - 1. If count reaches zero, you have found the outermost nested syntax rule
Repeat this procedure as many times as needed. Don't count rounded brackets in character strings.
next identify all terminal and non-terminal symbols and also character string representations of terminal symbols which are not part of nested syntax rules identified in previous step.
all cardinality operators that appear to the right of some element identified above are part of that element and change the nature of that element: it becomes cardinality expression.

Let's take an example:

    @@field valuefieldreference-token Type UNIQUE ? @push-governor ValueOptionalitySpec ? @pop-governor

If we ignore production name and positional markers, we get:

    valuefieldreference-token Type UNIQUE ? ValueOptionalitySpec ?

We can easily see that production contains the following terminal and nonterminal symbols:

valuefieldreference-token
Type
UNIQUE
ValueOptionalitySpec

Two of them, UNIQUE and ValueOptionalitySpec, have cardinality operators to the right of them, so that we can conclude, that production above has the following building blocks:

non-terminal symbol valuefieldreference-token
non-terminal symbol Type
cardinality expression UNIQUE ?
and another cardinality expression ValueOptionalitySpec ?

Note: UNIQUE and ValueOptionalitySpec are not building blocks of production from above example. Cardinality expressions containing them are production's building blocks. This is a very important fact that we must take into account when using the source code generated by the ANGLR compiler.

Now, take a look at the following example again:

    (( : number : NUMBER | '+' number | '-' number ) | '(' expression ')' ) + [ '<<' | '>>' ] + [ '*' | '/' ] + [ '+' | '-' ]

Following instructions for identification of nested syntax rule, we find the following facts:

(( : number : NUMBER | '+' number | '-' number ) | '(' expression ')' ) is the outermost nested syntax rule
+ [ '<<' | '>>' ] + [ '*' | '/' ] + [ '+' | '-' ] is the series of cardinality operators

We can conclude, that production consists of one building block - cardinality expression built of above nested syntax rule and series of cardinality operators.

Let's write the above example in sligtly different but equivalent way:

    (((( : number : NUMBER | '+' number | '-' number ) | '(' expression ')' ) + [ '<<' | '>>' ] ) + [ '*' | '/' ] ) + [ '+' | '-' ]

Now we have the following partition of production:

(((( : number : NUMBER | '+' number | '-' number ) | '(' expression ')' ) + [ '<<' | '>>' ] ) + [ '*' | '/' ] )
+ [ '+' | '-' ]

The second example is a better way to spell a syntax rule. Anglr compiler will silently translate the first form in the second one. Nevertheless, either examples have the same drawbacks. They introduce anonymous nested rules. Although the compiler will generate their names, the problem is that we don't know in advance what they are. It may also happen that the compiler will generate different names if we change the contents of the anglr file. Therefore, stick to the following advice: don't use anonymous nested syntax rules, give them names, like in the following example:

    : additive-expression : ( : multiplicative-expression : ( : shift-expression : ( : unary-expression : ( : number : NUMBER | '+' number | '-' number ) | '(' expression ')' ) + [ '<<' | '>>' ] ) + [ '*' | '/' ] ) + [ '+' | '-' ]

RULE P-13 - Basic Building Block of Production

Rule RULE P-13 defines <name>, the basic building block of any production. It should be:

<any>: this terminal symbol has no textual representation and cannot be produced by lexical analyzer
<string>: textual representation of any terminal symbol, being defined or not
<identifier>: identifier representing terminal or non-terminal symbols. All of them must be defined

The compiler will convert every syntax rule, no matter how many nesting rules it contains, no matter how deep these rules nest, no matter how many cardinal expressions they contain, into a format made up only of terminal and non-terminal symbols. Terminal symbols can be also represented by textual representations if they have one, but this does not change things, because internaly they are identical to terminal symbols itself.

Let's examine this syntax rule:

    (( : number : NUMBER | '+' number | '-' number ) | '(' expression ')' ) + [ '*' | '/' ] + [ '+' | '-' ]

Anglr compiler will translate it in this form before generating code:

expression
	:	<generated-rule-2 set> 
	|	expression '-' <generated-rule-2 set> 
	|	expression '+' <generated-rule-2 set> 
	;

<generated-rule-2 set>
	:	<generated-rule-2> 
	|	<generated-rule-2 set> '/' <generated-rule-2> 
	|	<generated-rule-2 set> '*' <generated-rule-2> 
	;

<generated-rule-2>
	:	number 
	|	'(' expression ')' 
	;

number
	:	NUMBER 
	|	'+' number 
	|	'-' number 
	;

Above syntax rules contain only terminal and nonterminal symbols.

RULE P-14 - Cardinality of Production Building Block

Rule RULE P-14 defines cardinality operators and delimiters, which should be used to form strings of <name>s. Cardinality operators always operate on building blocks of productions, making them cardinality expressions, which are also building blocks of productions. Cardinality operators are used to express specific kinds of recursive syntax rules in much compact form. Use of cardinality operators is higly encouraged. Syntax rules which use cardinality operators are shorter, more understandable, the generated code is usually faster.

Cardinality is composed of two parts:

cardinality operator
non-mandatory delimiter

Simple example of cardinality is the following:

+ [ '|' ]

It is composed of:

cardinality operator +
and delimiter [ '|' ]

RULE P-15 - Cardinality Operators

Rule RULE P-15 defines cardinality operators. They can be:

'?': 'general' name may or may not appear in syntax rule production.
'+' and '-': 'general' name should be repeated at least one time.
'*' and '/': 'general' name should be repeated any number of times including zero.

Above operators are discussed in detail in the introductory section of this web page, since we needed them to explain the presented topics. However, the definitions set out in that section apply only to simple cases, with no delimiters or very simple one which consist of only one terminal symbol or it's charater representation. In the discussion of syntax rule P-15 will be explained general rules, which will cover general structure of cardinality delimiters.

RULE P-16 - Delimiter

Rule RULE P-16 defines the structure of delimiters. Delimiters are usually simple character strings like ',', '-', '|' etc. However, delimiters can be arbitrary nested syntax rules. Also simple character strings mentioned before are the simplest nested rules. These nested rules are composed of single production which contains single terminal symbol (in fact, character string representation of terminal symbol). In general: delimiters are nested rules enclosed with square brackets [ and ].

Unlike nested rules that act in productions of syntax rules, nested syntax rules in delimiters should be anonymous.This is because of the simple fact that nesting rules acting in productions are replaced by their names, generated by compiler if they are anonymous. On the other hand, the rule for the creation of syntax rule, which represents particular cardinality operator, is applied for every production of nested syntax rule separately.

Let's take an example of cardinality expression with non-trivial nested syntax rule in delimiter:

additive-expression
    : multiplicative-expression + [ '+' | '-' ]
    ;

If we strictly obey rule for cardinality operator + with separator we should expect, that Anglr compiler translates above syntax rule in something like this:

additive-expression
    : multiplicative-expression
    | additive-expression '+' | '-' multiplicative-expression
    ;

However this is not what one would expect, because it's a strange definition of an arithmetic expression. In cases where the nested rule consists of several productions, Anglr compiler treats each of them separately, as if we had more separate cases. In fact, example above is divided into two separate cases. For every production of nested syntax rule in the delimiter of cardinality expression we get one cardinality expression:

additive-expression
    : multiplicative-expression + [ '+' ]
    ;
additive-expression
    : multiplicative-expression + [ '-' ]
    ;

Translating them into canonical form we get these syntax rules:

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

or what is equivalent:

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

Now, let's imagine more hypotetical example:

additive-expression
    : multiplicative-expression + [ A B | C D ]
    ;

Every production of nested rule in above example has two building blocks (terminal or non-terminal symbols). This example would be translated in this canonical form:

additive-expression
    : multiplicative-expression
    | additive-expression A B multiplicative-expression
    | additive-expression C D multiplicative-expression
    ;

Note that A, B, C and D should be any building blocks - basic, nested syntax rule or cardinality expression