Anglr Compiler
Introduction
Anglr compiler is a tool which translates syntax and lexical rules specified in an Angler file into the set of source files written in selected programming language. It is used in two ways:
- manually from the command prompt
- automatically as an extension of Microsoft Visual Studio.
Currently it has only the following parameters:
- -d - switch to activate debug messages
- -t - switch which enables generation of source code. The name of this switch is attributed to the first character of the last word in acronym CST (concrete syntax tree). The large part of generated source code actually belong to the classes and their methods associated with the manipulation of the CSTs.
- -l - switch which generates code which will detect run-time loops in generated code
- -i - switch which changes the generation of CST manipulation methods: recursive methods are transformed in iterative one
- -p - switch which enables automatic detection of precedence grammar. Generated parser is optimized for precedence syntax rules.
- -gen <lang> - parameter with wich we can specify the programming language of generated code. Currently only cs is valid selection for <lang>, meaning C# source code
- <src> - name of Anglr source file, containing the syntax rules which will be translated into the set of source files of selected programming language.
All other parameters concerning the compilation process are specified in Anglr source files.
Example:
anglr -t -l -gen cs math.anglr > math-anglr.txt
With this command, we translate the syntax rules in file math.anglr into a set of C# source files. Output messages are stored into the file math-anglr.txt
Generated source code
An Anglr compiler generates source code in the selected programming language using the instructions provided in the Anglr file, which makes it possible to create a text analsysis tool for text files whose content is built according to the syntax rules found in the said Anglr file. The most important product of the compiler is the source code generated in the selected programming language. The generated source code can be divided into the following functional sets::
- lexical analyzer source code
- syntax analzer source code
- source code for creating a concrete syntax tree (CST)
- source code for semantic analysis with the help of CST
- source code for qdditional functionalities
The structure of generated source code should be described in this way:
- for every syntax rule found in any parser part of Anglr file there is a file in selected programming language containig the definition of a class associated with this syntax rule
- for every declaration part of Anglr file there is a file in selected programming language containing the definition of a class associated with this part
- for every scanner part of Anglr file there is a file in selected programming language containing the definition of a class associated with this part
- for every lexical part of Anglr file there is a source file in selected programming language containing the definition of a class representing this part.
- for every parser part of Anglr file there is a source file in a selected programming language containg the definition of a class representing this parser part.
- the implementation of a class associated with the concrete syntax tree builder.
- the implementation of a class associated with concrete syntax tree traversal methods.
What's the purpose of the generated source code? With this source code, we can build an application that analyzes text files whose content meets the syntax rules listed in the Anglr file associated with generated source code. All these applications are built in a similar way following this scenario:
- Syntax analyzer performes syntax analyses of the input text file. This step of the application uses classes associated with parser parts of Anglr file.
- Input text file is nothing but a string of terminal symbols separated by spaces. Lexical analyzer is intensively used by syntax analyzer to extract terminal symbols from the input text file. Lexical analyzer is implemented with classes associated with the lexer parts of Anglr file.
- Lexical analyzer use scanners to extracts terminal symbols from input text file. It can invoke as many scanners as needed. Scanners are implemented with classes associated with scanner parts of Anglr file.
- A concrete syntax tree (CST) is also built during the syntax analysis of the text file. The build begins with the first terminal read, and ends after the last one is read. Methods of class associated with CST construction are responsible to do that.
- When the syntax analysis is complete and the CST is constructed, the most important part begins: semantic analysis of the source file, in the broadest sense of that word. Semantic analysis is essentially a transformation of CST into a new form. Each node of CST represents part of the source text file. The leaves of this CST represent terminal symbols together with their character representation, and the inner nodes represent non-terminal symbols. The inner nodes that are closer to the top of the CST represent a larger portion of the source file than those located lower. The top node represents the entire file, while the leaves represent the individual terminal symbols. To perform semantic analysis we can use CST traversal methods of the class which performs them. CST can be traversed as many times as needed. Any iteration can perform a different task. At every iteration of a CST traversal we can do a part of semantic analysis or transformation of the source text. In that manner we can do as many text trasnformations as we need to do them. If these transformations are independent, then they can be performed together in one iteration, otherwise they should be performed sequentially.
Every step of scenario described above can be monitored by application. This is done by monitoring the events triggered by the methods that implement scenario described above. Every event can be associated with appropriate method provided by application. When an event is triggered, associated application method is invoked. A huge number of events are triggered, whether it is during the reading phase of the input text file or during the post-processing phase of the CST.
Class associated with syntax rule
Every syntax rule is associated with exactly one class implemented in selected programming language. Instances of this class represent nodes of concrete syntax tree (CST) generated by syntax analyzer. All classes are composed in a similar way, as follows:
- their names are a modified versions of the syntax rule names to which they belong.
- they are subclasses of the class SyntaxTreeBase
- enumeration production_kind enumerates all productions of class associated with particular syntax rule
- There is one constructor for each production of syntax rule.
- Every class has explicitely defined copy constructor
- then the Clone () method is specified. It is wrapper for copy constructor.
- The Dispose method is used to release a sub-tree that is rooted in node associated with this syntax rule.
- for every production of syntax tree there is a change () method which completely rebuilds the sub-tree of node
- the checkInclusion () method is used to verify if a node is located in a sub-tree.
- Emit () and EmitProductionTree () methods are used to display the sub-tree of a given node
- InvokeTraverse () method is used to traverse the sub-tree of a given node
- reparent () method is used to move the sub-tree to another node
- init () method initializes all properties of the node
- every class has static counter which counts all nodes created by this class.
- public properties of this class represent the branches emanating from the node
Let's take a look at the well-known example of syntax rule and at the class generated by Anglr compiler that belongs to it so we can get a sense of what we're talking about:
additive-expression : multiplicative-expression | additive-expression '+' multiplicative-expression | additive-expression '-' multiplicative-expression ;
Class associated with above syntax rule is presented in the text below:
namespace Math.Parser
{
//
// class associated with syntax rule additive-expression
//
public class additive_expression : SyntaxTreeBase
{
public enum production_kind // enumerated production(s) of syntax rule additive-expression
{
g_additive_expression_1 = 1, // multiplicative-expression
g_additive_expression_2 = 2, // additive-expression add multiplicative-expression
g_additive_expression_3 = 3, // additive-expression sub multiplicative-expression
};
// Constructor declaration(s) associated with production(s) of syntax rule additive-expression
//
// Constructor associated with the following production(s)
// additive-expression: multiplicative-expression
//
public additive_expression (multiplicative_expression p_multiplicative_expression) : base ((uint) SyntaxTreeWalker.ProductionID._additive_expression_ID, (uint) production_kind.g_additive_expression_1)
{
++g_counter;
_init ();
m_multiplicative_expression = p_multiplicative_expression;
}
//
// Constructor associated with the following production(s)
// additive-expression: additive-expression add multiplicative-expression
// additive-expression: additive-expression sub multiplicative-expression
//
public additive_expression (additive_expression p_additive_expression, SyntaxTreeToken p_token, multiplicative_expression p_multiplicative_expression, int kind) : base ((uint) SyntaxTreeWalker.ProductionID._additive_expression_ID, (uint) kind)
{
++g_counter;
_init ();
switch ((production_kind) this.kind)
{
case production_kind.g_additive_expression_2:
m_additive_expression = p_additive_expression;
m_add = p_token;
m_multiplicative_expression = p_multiplicative_expression;
break;
case production_kind.g_additive_expression_3:
m_additive_expression = p_additive_expression;
m_sub = p_token;
m_multiplicative_expression = p_multiplicative_expression;
break;
default:
{
string[] args = new string[] { "additive_expression" };
throw new SyntaxTreeError (SyntaxTreeError.InvalidKindError, args);
}
}
}
// Copy constructor
public additive_expression (additive_expression p_additive_expression) : base (p_additive_expression.id, p_additive_expression.kind)
{
++g_counter;
if ((m_multiplicative_expression = (p_additive_expression.m_multiplicative_expression != null) ? new multiplicative_expression (p_additive_expression.m_multiplicative_expression) : null) != null) m_multiplicative_expression.parent = this;
if ((m_additive_expression = (p_additive_expression.m_additive_expression != null) ? new additive_expression (p_additive_expression.m_additive_expression) : null) != null) m_additive_expression.parent = this;
if ((m_add = (p_additive_expression.m_add != null) ? new SyntaxTreeToken (p_additive_expression.m_add) : null) != null) m_add.parent = this;
if ((m_sub = (p_additive_expression.m_sub != null) ? new SyntaxTreeToken (p_additive_expression.m_sub) : null) != null) m_sub.parent = this;
}
// Clone node - wrapper of copy constructor
public override SyntaxTreeBase Clone () { return (SyntaxTreeBase) new additive_expression (this); }
// Destructor
public new void Dispose ()
{
base.Dispose ();
--g_counter;
if (m_multiplicative_expression != null)
{
m_multiplicative_expression.Dispose ();
m_multiplicative_expression =null;
}
if (m_additive_expression != null)
{
m_additive_expression.Dispose ();
m_additive_expression =null;
}
if (m_add != null)
{
m_add.Dispose ();
m_add =null;
}
if (m_sub != null)
{
m_sub.Dispose ();
m_sub =null;
}
_init ();
}
// Content changing function(s) associated with production(s) of syntax rule additive-expression
//
// Content changing function associated with following production(s)
// additive-expression: multiplicative-expression
//
public void change(multiplicative_expression p_multiplicative_expression)
{
_init ();
m_kind = (uint) production_kind.g_additive_expression_1;
m_multiplicative_expression = p_multiplicative_expression;
reparent (parent);
}
//
// Content changing function associated with following production(s)
// additive-expression: additive-expression add multiplicative-expression
// additive-expression: additive-expression sub multiplicative-expression
//
public void change(additive_expression p_additive_expression, SyntaxTreeToken p_token, multiplicative_expression p_multiplicative_expression, int kind)
{
_init ();
m_kind = (uint) kind;
switch ((production_kind) this.kind)
{
case production_kind.g_additive_expression_2:
m_additive_expression = p_additive_expression;
m_add = p_token;
m_multiplicative_expression = p_multiplicative_expression;
break;
case production_kind.g_additive_expression_3:
m_additive_expression = p_additive_expression;
m_sub = p_token;
m_multiplicative_expression = p_multiplicative_expression;
break;
default:
{
string[] args = new string[] { "additive_expression" };
throw new SyntaxTreeError (SyntaxTreeError.InvalidKindError, args);
}
}
reparent (parent);
}
//
// check if node 'element' is included within syntax tree associated with syntax rule additive-expression
//
public new bool checkInclusion (SyntaxTreeBase element)
{
return
(element == this) ||
((m_multiplicative_expression != null) && m_multiplicative_expression.checkInclusion (element)) ||
((m_additive_expression != null) && m_additive_expression.checkInclusion (element)) ||
((m_add != null) && m_add.checkInclusion (element)) ||
((m_sub != null) && m_sub.checkInclusion (element)) ||
false;
}
//
// emit production tree node associated with any production of syntax rule additive-expression
//
public override string Emit (int depth)
{
string s = "";
if (depth != 0)
switch ((production_kind) kind)
{
case production_kind.g_additive_expression_1:
// emit syntax tree node associated with production
// additive-expression: multiplicative-expression
s += m_multiplicative_expression.Emit (depth - 1);
break;
case production_kind.g_additive_expression_2:
// emit syntax tree node associated with production
// additive-expression: additive-expression add multiplicative-expression
s += m_additive_expression.Emit (depth - 1);
s += " ";
s += m_add.Emit(depth - 1);
s += " ";
s += m_multiplicative_expression.Emit (depth - 1);
break;
case production_kind.g_additive_expression_3:
// emit syntax tree node associated with production
// additive-expression: additive-expression sub multiplicative-expression
s += m_additive_expression.Emit (depth - 1);
s += " ";
s += m_sub.Emit(depth - 1);
s += " ";
s += m_multiplicative_expression.Emit (depth - 1);
break;
}
return s;
}
//
// emit production tree node associated with any production of syntax rule additive-expression
//
public override string EmitProductionTree (int depth)
{
string s = "additive_expression (";
if (depth != 0)
switch ((production_kind) kind)
{
case production_kind.g_additive_expression_1:
// emit syntax tree node associated with production
// additive-expression: multiplicative-expression
s += m_multiplicative_expression.EmitProductionTree (depth - 1);
break;
case production_kind.g_additive_expression_2:
// emit syntax tree node associated with production
// additive-expression: additive-expression add multiplicative-expression
s += m_additive_expression.EmitProductionTree (depth - 1);
s += ' ';
s += "add";
s += ' ';
s += m_multiplicative_expression.EmitProductionTree (depth - 1);
break;
case production_kind.g_additive_expression_3:
// emit syntax tree node associated with production
// additive-expression: additive-expression sub multiplicative-expression
s += m_additive_expression.EmitProductionTree (depth - 1);
s += ' ';
s += "sub";
s += ' ';
s += m_multiplicative_expression.EmitProductionTree (depth - 1);
break;
}
s += ")";
return s;
}
//
// traverse sub-tree rooted in this node using selected syntax tree walker
//
public override void InvokeTraverse (SyntaxTreeWalker syntaxTreeWalker)
{
syntaxTreeWalker.Traverse (this);
}
// Change parent node - move sub-tree
public override void reparent (SyntaxTreeBase parent)
{
this.parent = parent;
if (m_multiplicative_expression != null)
m_multiplicative_expression.reparent (this);
if (m_additive_expression != null)
m_additive_expression.reparent (this);
if (m_add != null)
m_add.reparent (this);
if (m_sub != null)
m_sub.reparent (this);
}
//
// initialize object associated with syntax rule additive-expression
//
public void _init ()
{
m_multiplicative_expression = null;
m_additive_expression = null;
m_add = null;
m_sub = null;
}
// counter of all nodes associated with syntax rule additive-expression
public static int g_counter;
// objects associated with terminal and non-terminal symbols within production(s) of syntax rule additive-expression
public multiplicative_expression m_multiplicative_expression { get; private set; }
public additive_expression m_additive_expression { get; private set; }
public SyntaxTreeToken m_add { get; private set; }
public SyntaxTreeToken m_sub { get; private set; }
};
public partial class SyntaxTreeWalker
{
// delegate function (callback) prototype associated with syntax rule additive-expression
public delegate bool additive_expression_Callback (SyntaxTreeCallbackReason reason, additive_expression.production_kind kind, additive_expression p_additive_expression);
// event associated with syntax rule additive-expression
public event additive_expression_Callback additive_expression_Event;
// event trigger associated with syntax rule additive-expression
public bool Raise_additive_expression_Event (SyntaxTreeCallbackReason reason, additive_expression.production_kind kind, additive_expression p_additive_expression)
{
bool? status = additive_expression_Event?.Invoke (reason, kind, p_additive_expression);
return (status == null) || status.Value;
}
//
// traverse syntax tree node associated with any production of syntax rule additive-expression
//
public void Traverse (additive_expression p_additive_expression)
{
if (p_additive_expression.isLocked())
return;
p_additive_expression.dolock();
additive_expression.production_kind kind = (additive_expression.production_kind) p_additive_expression.kind;
p_additive_expression.turn_reset ();
if (Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalPrologueCallbackReason, kind, p_additive_expression))
switch (kind)
{
case additive_expression.production_kind.g_additive_expression_1:
// traverse syntax tree node associated with production
// additive-expression: multiplicative-expression
if (Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalMidTermCallbackReason, kind, p_additive_expression))
Traverse (p_additive_expression.m_multiplicative_expression);
p_additive_expression.turn_inc ();
Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalMidTermCallbackReason, kind, p_additive_expression);
break;
case additive_expression.production_kind.g_additive_expression_2:
// traverse syntax tree node associated with production
// additive-expression: additive-expression add multiplicative-expression
if (Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalMidTermCallbackReason, kind, p_additive_expression))
Traverse (p_additive_expression.m_additive_expression);
p_additive_expression.turn_inc ();
if (Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalMidTermCallbackReason, kind, p_additive_expression))
Traverse (p_additive_expression.m_multiplicative_expression);
p_additive_expression.turn_inc ();
Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalMidTermCallbackReason, kind, p_additive_expression);
break;
case additive_expression.production_kind.g_additive_expression_3:
// traverse syntax tree node associated with production
// additive-expression: additive-expression sub multiplicative-expression
if (Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalMidTermCallbackReason, kind, p_additive_expression))
Traverse (p_additive_expression.m_additive_expression);
p_additive_expression.turn_inc ();
if (Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalMidTermCallbackReason, kind, p_additive_expression))
Traverse (p_additive_expression.m_multiplicative_expression);
p_additive_expression.turn_inc ();
Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalMidTermCallbackReason, kind, p_additive_expression);
break;
}
Raise_additive_expression_Event (SyntaxTreeCallbackReason.TraversalEpilogueCallbackReason, kind, p_additive_expression);
p_additive_expression.unlock();
}
//
// traverse syntax tree node associated with any production of syntax rule additive-expression
//
public void TraverseCommon (additive_expression p_additive_expression)
{
additive_expression.production_kind kind = (additive_expression.production_kind) p_additive_expression.kind;
if (Raise_Common_Event (SyntaxTreeCallbackReason.TraversalPrologueCallbackReason, (int) kind, p_additive_expression))
switch (kind)
{
case additive_expression.production_kind.g_additive_expression_1:
// traverse syntax tree node associated with production
// additive-expression: multiplicative-expression
TraverseCommon (p_additive_expression.m_multiplicative_expression);
break;
case additive_expression.production_kind.g_additive_expression_2:
// traverse syntax tree node associated with production
// additive-expression: additive-expression add multiplicative-expression
TraverseCommon (p_additive_expression.m_additive_expression);
TraverseCommon (p_additive_expression.m_add);
TraverseCommon (p_additive_expression.m_multiplicative_expression);
break;
case additive_expression.production_kind.g_additive_expression_3:
// traverse syntax tree node associated with production
// additive-expression: additive-expression sub multiplicative-expression
TraverseCommon (p_additive_expression.m_additive_expression);
TraverseCommon (p_additive_expression.m_sub);
TraverseCommon (p_additive_expression.m_multiplicative_expression);
break;
}
Raise_Common_Event (SyntaxTreeCallbackReason.TraversalEpilogueCallbackReason, (int) kind, p_additive_expression);
}
};
public partial class SyntaxTreeBuilder : SyntaxTreeWalker
{
//
// create syntax tree node associated with production
// additive-expression: multiplicative-expression
//
public additive_expression additive_expression_1 (multiplicative_expression p_multiplicative_expression)
{
additive_expression p_additive_expression_ref = new additive_expression(p_multiplicative_expression);
p_multiplicative_expression.parent = p_additive_expression_ref;
Raise_additive_expression_Event (SyntaxTreeCallbackReason.BuilderCallbackReason, additive_expression.production_kind.g_additive_expression_1, p_additive_expression_ref);
return p_additive_expression_ref;
}
//
// create syntax tree node associated with production
// additive-expression: additive-expression add multiplicative-expression
//
public additive_expression additive_expression_2 (additive_expression p_additive_expression, SyntaxTreeToken p_token, multiplicative_expression p_multiplicative_expression)
{
additive_expression p_additive_expression_ref = new additive_expression(p_additive_expression, p_token, p_multiplicative_expression, 2);
p_additive_expression.parent = p_additive_expression_ref;
p_token.parent = p_additive_expression_ref;
p_multiplicative_expression.parent = p_additive_expression_ref;
Raise_additive_expression_Event (SyntaxTreeCallbackReason.BuilderCallbackReason, additive_expression.production_kind.g_additive_expression_2, p_additive_expression_ref);
return p_additive_expression_ref;
}
//
// create syntax tree node associated with production
// additive-expression: additive-expression sub multiplicative-expression
//
public additive_expression additive_expression_3 (additive_expression p_additive_expression, SyntaxTreeToken p_token, multiplicative_expression p_multiplicative_expression)
{
additive_expression p_additive_expression_ref = new additive_expression(p_additive_expression, p_token, p_multiplicative_expression, 3);
p_additive_expression.parent = p_additive_expression_ref;
p_token.parent = p_additive_expression_ref;
p_multiplicative_expression.parent = p_additive_expression_ref;
Raise_additive_expression_Event (SyntaxTreeCallbackReason.BuilderCallbackReason, additive_expression.production_kind.g_additive_expression_3, p_additive_expression_ref);
return p_additive_expression_ref;
}
};
}
Class name
Name of the class associated with syntax rule is similar to the name of syntax rule. The compiler is trying to make a class name that is as similar as possible to the syntax rule name. If the syntax rule name contains characters that should not be contained within variable names in the selected programing language, it replaces them with an underscore. If syntax rule names does not contain such characters then their associated class names are equal to their names.
For example, let's take a look at a well-known example of a syntax rule:
additive-expression : multiplicative-expression | additive-expression '+' multiplicative-expression | additive-expression '-' multiplicative-expression ;
Since the name of syntax rule additive-expression contains character '-' that should not be used to compose variable names in most programming languages, Anglr compiler will turn it into character '_' when it composes the associated class name. The name of associated class would then be additive_expression.
Now let's imagine that the syntax rule name would be equal to <additive expression>. In this case the associated class name would be equal to _additive_expression_, since characters '<', '>' and ' ' should not be used to compose variable names in most programming languages.
Super class SyntaxTreeBase
Enumeration of productions
Constructors
Productions of some syntax rule having the same length and whose non-terminal symbols at the same position have the same names shall have the same constructors. In example above this is the case with the following productions:
additive-expression '+' multiplicative-expression
additive-expression '-' multiplicative-expression