Anglr Software Package

Overview


The documentation on this website describes the Anglr software package. This software package is used to create syntax, lexical and semantical analyzers together with procedures for creating syntax trees used in compilers and other syntax-oriented text transformations.

The term 'compiler', as used in this text, should be understood in a broader sense, namely: Syntax rules can be written for any data structure, e.g. for a list of addresses of persons. With the ANGLR compiler, we can translate these rules into source code that implements syntax parser for a list of addresses of persons. Together with the functions that form the skeleton of semantic functions for a list of addresses, we can create a program that converts this list into something else, e.g., a web page, sorts them, colors them, etc. Each of these conversions is in fact a translation of a list of addresses into some other form. We can say that in this way we create a compiler for converting a list of addresses into a web page, a compiler for converting people's addresses into a sorted form, or a compiler that colors people's addresses. Therefore we can say that any tool which performs syntax-oriented data conversion can be called a compiler.

Application source code and Anglr specifications are strictly separated, unlike same old tools like legendary yacc where source code injection was practically the only way to access functionality of generated syntax analyzer. With code separation strategy, Anglr specifications become reusable. The same specifications and, most importantly, the same source code generated by Anglr compiler become reusable. The generated source code can be viewed as a huge connector with enormous number of pins serving different needs: to monitor or change operation of any single step of syntax, lexical and semantic analyzers and more. With the help of this connector, the application can perform any syntax-oriented translation of source file. The same connector can be used to perform different tasks with different pin logic and that's why the separation is so important.

Anglr is a label for several terms:

  • Anglr Language: set of formal rules which should be used to produce syntax rules of particular programming language. These rules are usually stored in so-called anglr files. These files have an .anglr extension.
  • Anglr Compiler: software tool which should be used to compile specifications stored in anglr file. Anglr compiler translates these specifications into a set of source files. These files contain the software implementation of the syntax, lexical and semantic analyzers that are defined in the specified Anglr file. Syntax, lexical and semantic analyzers are basic building blocks for any syntax-oriented transformation including compilers themself.
  • Microsoft VisualStudio Anglr extension: Language Server Protocol extension, used to edit the contents of anglr files.
  • Microsoft VisualStudio Anglr project template: used to automate the translation of anglr files and source files generated by an anglr compiler.

The Anglr programming language plays a central role in software packages mentioned above. To facilitate the understanding of the substance presented in these pages, it is good to know the basics of the theory of context-free languages. There is a lot of stuff on the web in this regard, like this.

Example of an .anglr file:

[ Description Text='definitions of tokens and regular expressions used to define sntax']
[ Description Text='of simple arithmetic expressions']
[ CompilationInfo ClassName='MathDecls' NameSpace='Math.Declarations' Access='public']
%declarations mathDecls
%{
    %regex
    {
        decimal-digit [0-9]
        number {decimal-digit}+
    }

    %terminal
    {
        NUMBER
        add '+'
        sub '-'
        mul '*'
        div '/'
        lb '('
        rb ')'
        unknown
    }
%}

[ Description Text='definition of scanner, which extracts comments from input string']
[ Declarations Id='mathDecls' ]
[ CompilationInfo ClassName='CommentRegex' NameSpace='Math.ScannerLib' Access='public']
%scanner commentScanner
%{
[\*]+\/
    pop
[\n\r]
    skip
[^\*]+
    skip
[\*]+
    skip
%}

[ Description Text='definition of scanner, which extracts terminal symbols from input string']
[ Declarations Id='mathDecls' ]
[ CompilationInfo ClassName='MathRegex' NameSpace='Math.ScannerLib' Access='public']
%scanner mathScanner
%{
\/\*
    push commentScanner
{number}
    terminal NUMBER
\+
    terminal add
\-
    terminal sub
\*
    terminal mul
\/
    terminal div
\(
    terminal lb
\)
    terminal rb
[ \t]+
    skip
[\n\r]
    skip
.
    skip
%}

[ Description Text='Lexer for anglr file' Hover='true' ]
[
    UseScanner
        ScannerId='commentScanner'
        InitialScanner='mathScanner'
        Hover='true'
]
[ CompilationInfo ClassName='MathLexer' NameSpace='Math.Lexer' Access='public' Hover='true' ]
%lexer mathLexer
%{

%}

[ Description Text='definition of parser for simple arithmetic expressions']
[ Declarations Id='mathDecls' ]
[ Lexer Id='mathLexer' ]
[ CompilationInfo ClassName='MathParser' NameSpace='Math.Parser' Access='public']
%parser mathParser1
%{

[ Start ]
expression
    : additive-expression
    ;

additive-expression
    : multiplicative-expression
    | additive-expression '+' multiplicative-expression
    | additive-expression '-' multiplicative-expression
    ;

multiplicative-expression
    : unary-expression
    | multiplicative-expression '*' unary-expression
    | multiplicative-expression '/' unary-expression
    ;

unary-expression
    : NUMBER
    | '(' expression ')'
    ;

%}
            

Above example defines programming language which should be used to compile simple arithemic expressions for addition, subtraction, multiplication and division. This example (or different modifications of it) will be used in many cases on this website.