Declarations Source Code


Introduction

Anglr compiler translates every declaration part found in Anglr file into a set of declarations. These declarations can be later used by lexical and syntax analyzer and in general by any software component which uses information found in these parts. Generated code depends on the settings found in the CompilationInfo attribute preceeding declaration part.

CompilationInfo attribute

Attribute CompilationInfo is quasi-mandatory attribute of every declaration part. If it is not present the necessary information needed to generate the source code is found elsewhere as it will be noted below. As we already know every attribute contains a set of name-value pairs. CompilationInfo attribute of can contain these name-value pairs:

Name Value Description
ClassName name of generated class this name will be used by Anglr compiler to create a class which will contain definitions of terminal symbols and regular expressions found in declaration part of Anglr file. If this setting is not found in the CompilationInfo attribute, the declaration part name will be used for the name of generated class
NameSpace namespace of generated class this name will be used by Anglr compiler as the name of namespace containg the generated class. If this setting is not found in the CompilationInfo attribute, the namespace name found in the setting NameSpace of the CompilationInfo of the general part of Anglr file will be used instead if there is one. Otherwise the name of Anglr file will be taken for the name of namespace.
Access access of generated class Access can have one of the following three values: internal, public and private of which internal and public are preffered, since private access will make generated class unaccessible.
TokenPrefix prefix of generated field names Sometime the names of fields found in the generated class associated with the declarations part of Anglr file does not comply with the selected programming language. In this case it is often usefull to rename the names of some or all fields in the generated class. This setting is provided just for this purpose: it prepends the given prefix to every field name found in generated class being it terminal symbol name or reqular expression symbol name. It is currently impossible to rename selected terminal or regular expression names, except by renaming it in the Anglr file itself. That means that with this setting you can rename all field names or none at all if this option does not exist or if its value is an empty string.

Generated Code

Anglr compiler assignes one class to every declaration part found in Anglr file. The name of this class must be specified in the CompilationInfo attribute, which is located before the declaration part definition. The best way to understand the material presented is to present it with an example. Let's take this declaration part for an example:

[ Description Text='definitions of tokens and regular expressions used to define syntax']
[ Description Text='of simple arithmetic expressions']
[ CompilationInfo ClassName='MathDecls' NameSpace='Math.Declarations' Access='public']
%declarations mathDecls
%{
    %regex
    {
        decimal-digit [0-9]
        number {decimal-digit}+
        add \+
        sub \-
        mul \*
        div \/
        shl \<\<
        shr \>\>
        lb \(
        rb \)
    }

    %terminal
    {
        NUMBER
        add '+'
        sub '-'
        mul '*'
        div '/'
        shl '<<'
        shr '>>'
        lb '('
        rb ')'
        unknown
    }
%}
        

Above declaration part willl be compiled into this piece of source code:

namespace Math.Declarations
{
    public class MathDecls
    {
        // values of terminal symbols
        public class tokens
        {
            public const int NUMBER = 258;
            public const int add = 259;
            public const int sub = 260;
            public const int mul = 261;
            public const int div = 262;
            public const int shl = 263;
            public const int shr = 264;
            public const int lb = 265;
            public const int rb = 266;
            public const int unknown = 267;
        }

        // values of regular expressions
        public class regex
        {
            public const string decimal_digit = @"[0-9]";
            public const string number = @"[0-9]+";
            public const string add = @"\+";
            public const string sub = @"\-";
            public const string mul = @"\*";
            public const string div = @"\/";
            public const string shl = @"\<\>";
            public const string shr = @"\<\>";
            public const string lb = @"\(";
            public const string rb = @"\)";
        }
    }
}
        

In the example above, we notice the following:

  • the name of the generated class MathDecls is equal to the value of the ClassName setting in the attribute CompilationInfo: ClassName='MathDecls'
  • terminal definitions are collected in the inner class named tokens. Their names are the same as names in terminal definitions in the declaration part of Anglr file. They are defined in that way:
    • public: any class has access to them
    • const: they are constants and thus cannot be changed - they are read only properties of class tokens.
    • int: they are integer constants. In fact they are allways greater than zero, so they should be defined as an unsigned int instead. But since they often appear in the terms where the validity of certain variables is verified they are defined as integers. These variables are usually initialized with negative values, which means that they have invalid values at the start of application. Later when they are assigned terminal values they become valid. Checking their values against negative values verifies their validity.
  • Regular expressions are collected in the inner class named regex. Their names are also the same as names in the definitions of regular expressions in the declaration part of Anglr file. They are defined in that way:
    • public: any class has access to them
    • const: they are constants and thus cannot be changed - they are read only properties of class regex.
    • string: they are string constants. Their values are expressed in expanded form, where all references to other regular expressions are resolved. For instance, number is not defined as:
      public const string number = @"{decimal-digit}+";
                                  
      but in an expanded form, where {decimal-digit} is exchaged with its value [0-9]:
      public const string number = @"[0-9]+";
                                  
      where we took into account the definitions of regular expressions number and decimal-digit in the declaration part of Anglr file.

Sometimes the generated code does not comply with the programming language. Let's take a look at the C# syntax rules represented in Appendix-E, for example. At the beginning of the declaration part there is this block definition of terminal symbols:

    %terminal
    {
        namespace identifier extern alias using float double sbyte byte short ushort int uint
        long ulong char decimal bool nullable-value-type ref out object string base this
        new typeof void await checked default unchecked delegate async from in group by join
        on equals into let orderby ascending descending is as select where break continue do
        while finally for foreach catch goto case if else const var lock return switch throw
        try yield class partial public protected internal private abstract sealed static struct
        readonly volatile virtual override get params set add event remove operator implicit
        explicit true false interface enum unsafe fixed stackalloc dynamic sizeof null
        right-shift right-shift-assignment
    }
            
This piece of Anglr code contains definitions of terminal symbols which represent C# keywords. Their names are equal to C# keywords, too, which is perfectly legal, since Anglr does not know nothing about C# and its keywords. When we compile this Anglr file using Anglr compiler, we get the following C# code:
        public class tokens
        {
            public const int namespace = 258;
            public const int identifier = 259;
            public const int extern = 260;
            public const int alias = 261;
            public const int using = 262;
            public const int float = 263;
            public const int double = 264;
            public const int sbyte = 265;
            public const int byte = 266;
            // and many other definitions of public const int fields
            
The words colored red are reserved words in C# and will not compile, since reserved words cannot be used as names in the definitions of class fields. We have two solutions for this problem. We can redefine the names of the terminal symbols or we can use a far better solution to this problem, the use of TokenPrefix option like in this example:
[ CompilationInfo ClassName='CsharpDeclarations' NameSpace='Csharp.Declarations' Access='public' TokenPrefix='token_' Hover='true' ]
%declarations csharpDecls
%{
    %terminal
    {
        namespace identifier extern alias using float double sbyte byte short ushort int uint
        long ulong char decimal bool nullable-value-type ref out object string base this
        new typeof void await checked default unchecked delegate async from in group by join
        on equals into let orderby ascending descending is as select where break continue do
        while finally for foreach catch goto case if else const var lock return switch throw
        try yield class partial public protected internal private abstract sealed static struct
        readonly volatile virtual override get params set add event remove operator implicit
        explicit true false interface enum unsafe fixed stackalloc dynamic sizeof null
        right-shift right-shift-assignment
    }
            
where we used this setting TokenPrefix='token_' to produce the following code:
        public class tokens
        {
            public const int token_namespace = 258;
            public const int token_identifier = 259;
            public const int token_extern = 260;
            public const int token_alias = 261;
            public const int token_using = 262;
            public const int token_float = 263;
            public const int token_double = 264;
            public const int token_sbyte = 265;
            public const int token_byte = 266;
            // and many other definitions of public const int fields