Com.Bricologica.LALRLib Usage

Introduction

This section explains how to use the parts of the LALRLib, the GUI, the parse stepper, and the command line tool. These are all available at bricologica.com or at github.com. They are licensed under GPLv3. The build files are for Microsoft Visual Studio Express C#.

Deliverables

This section lists the deliverables for the LALRLib package.

The Library

LALRLib.dll contains the code described below.

LALRGUI

LALRGUI is a GUI application that uses the Designer classes to let the user run a grammar.

Parse Stepper

The parser stepper is GUI application lets a user step through each parse step of a grammar and inspect the parse and push stacks. The user can go forward and backward.

Code Generators

The lalrgen.exe is command line application that generates serializable files created by the generators. This allows a user to skip the time consuming process of generating the tables. It can also generate the code to make the table as a compile time object.

Documentation

This PDF.

LALRLib

LALRLib is the workhorse of the work I've done. It contains everything needed to create LALR scanners and parsers. The top level is LALRManager which aggregates the components needed to parse a file—the grammar, scanner, converter, reducer, and parser. It is an abstract class, so the user must create a class with their more specific operations. An example is the Designer which supports a user in the design a new grammar, converter, and reducer.

The Manager Template

The top level of the library is the LALRManager. Every component is publicly available, but the LALRManager is a convenient way to manage the whole process. The three generics are user defined objects derived from TokenCreatorBase, ConverterBase, and ReducerBase, respectively. There is a set of Designer classes that can be used here. Applications that use only the Designer versions may simply use the Designer class which derives from the LALRManager.

Token Creator

The token creator contains a function called Process that for each terminal type the processes the token. Process takes an IContext object, the terminal ID, and a ScanStateManager object and uses these to create a new token which comes in. The Process function may do any of the following:

Read the matched string to calculate literals
Alters the scan state
Nulls out the token so that no token is returned
Alters the token (e.g. turn and 'int' into a 'long' based of the length of the string.)
Uses or updates context
Determine if certain keywords are actually identifiers
Determine if certain identifiers are actually keywords
Anything else that needs to be done

Converter

The converter is a module with the same interface as the scanner. It sits between the scanner and parser and contains modules that look for the start of certain patterns in the token stream which may indicate a conversion is necessary. If a pattern starts, it starts to pull in tokens until the pattern is completed or it is determined that the pattern doesn't match the expected pattern. If a match is found, the converter can manipulate the tokens it examined, doing substitutions or adding delimiters. The converters deliver up tokens from this token queue.

Reducer

The reducer is an object with one function: Reduce. Reduce takes the rule ID and an array of symbols to reduce by that rule. 4.3.2 Manager Elements

Grammar

The grammar object holds the terminals, non-terminals, and rules. It is configurable from a .lang file or a serializable GrammarInfo object, but does not require any programming.

Scanner

The scanner object hold the scanner. It can be used as a standalone scanner or will automatically feed into the converter or parser. It is configurable by either an IGrammar or by the serializable ScannerInfo which holds the ScanState indexed ScanTables.

Parser

The parser object holds the parser. It can be used as a standalone parser or will automatically pull tokens from the scanner or converter if it exists. It is configurable by either an IGrammar or by the serializable ParseTable.

User Written Manager Elements

4.3.2.4.1 TokenCreatorBase A user derived token creator is needed to create scanned tokens. A token type is determined from ITerminal and may be a user defined TokenBase derived class. The token creator simply contains an array of Factory objects which are Factory objects that create the tokens. This has to be a derived class because the Factory is a compile time operation. ITerminal.Id (IToken.Symbol.Id) is the index. 4.3.2.4.2 Factory The factory creates a token of type T from the generic constructor. T must be derived from TokenBase. There is an abstract function Process which takes all the information from the scanner and initializes the token. Process returns a token. Usually it simply returns itself, but it can also return null if the token should be passed to the parser or a different token such as a more specific token determined from the scanned text. 4.3.2.4.3 ConverterBase Converters intercept tokens on the way from the scanner to the parser and do any manipulations that need to be done to support parsing. This is part of the design of the grammar, so this will require user created code. It is possible to have multiple converters cascaded together with a higher level converter that contains the cascaded converters. 4.3.2.4.4 ReducerBase The user derived reducer is the object that reduces a rule using tree objects that may be user-defined TokenBase or NodeBase derived classes and returns a node that may be a user defined NodeBase. There is just one function that takes an IRule, the right hand side as an array, and the context object.

Designer Elements

4.3.3.1 GrammarGenerator The GrammarGenerator is exposed by the Designer. It takes lines that represent a grammar and converts them to a GrammarInfo. 4.3.3.1.1 Generation GrammarInfo GrammarGenerator.Generate(String[] lines); 4.3.3.1.2 GrammarInfo GrammarInfo is a serializable class that results from a GrammarGenerator and is sufficient to initialize a Grammar object. 4.3.3.2 ScannerGenerator The ScannerGenerator is exposed by the Designer. It takes a Grammar and calculates the ScannerInfo which is a ScanState indexed set of ScanTables. 4.3.3.2.1 Generation ScannerInfo ScannerGenerator.Generate(String[] lines); 4.3.3.2.2 ScannerInfo ScannerInfo is a serializable class that results from ScannerGenertor and is sufficient to initialize a Scanner object. 4.3.3.3 Parser Generator The ParserGenerator is exposed by the Designer. It takes a Grammar and calculates the ParseTable. 4.3.3.3.1 Generation ParseTable ParserGenerator.Generate(String[] lines); 4.3.3.3.2 ParseTable ParseTable is a serializable class that results from ParserGenerator and is sufficient to initialize a Parser object. Using a serialized version of the class can be significantly faster than generating the ParseTable from a Grammar object. 4.3.3.4 TokenBase User tokens are derived from TokenBase. TokenBase is possibly the root of a gradually derived token class tree. Extra methods and properties can be added to aid in their use in reduction and usage in whatever purpose the parse tree will be used for (data of literals, code generation, search and manipulation). 4.3.3.5 NodeBase User nodes are derived from NodeBase. NodeBase is possibly the root of a gradually derived token class tree. Extra methods and properties can be added to aid in their use in reduction and usage in whatever purpose the parse tree will be used for (code generation, search and manipulation).

Converter

The DesignerConverter takes a DLL name and the name of a class in that DLL and creates an instance of that in an AppDomain. This allows the DLL be unloaded for iterative design.

Converter

Plugin

The Designer converter uses an AppDomain to load in the converter. This allows unloading when the user wants to recompile for iterative grammar design and troubleshooting. 4.3.4.2.1 How it Works 4.3.4.2.2 LiteToken A LiteToken contains only the UniqueId, the Abbrev, and the SymbolId. This is to make fast transport across the AppDomain boundary, but if the converters will be used in the Designer, they can only use these three pieces of information to make decisions. 4.3.4.2.3 Host The host creates the AppDomain and loads the client. It then manages communication to the client which actually controls the loaded ConverterBase derived class in the loaded DLL. bool Load(String path, String clientTypeName, String[] abbrevArray); The load function loads the DLL at 'path' and creates an instance of the class whose fully qualified type name is found in the clientTypeName. The array of abbreviations is how the DLL will be able to tie the abbreviation to the SymbolId. void Unload() The unload function unloads the AppDomain. It allows recompile of the DLL. 4.3.4.2.4 Client The client is the MarshalByRefObject derived object that straddles the AppDomain.

LALRSerializationContext

When serializing, it is necessary to pass the LALRSerializationContext to the Deserialize function. Some objects need temporary stores on this object to reconstruct themselves. The user is not required to add anything to this, though they may if they plan on making a larger serializer.

ContextBase

Both token creation and reduction may make use of a context object supplied by the user. This is optional, but any context object passed to IParser.Parse(…) or ITokenizable.Next(…) should be derived from ContextBase.

Designer

4.3.5.1 Grammar by File 4.3.5.2 Scanner by Grammar 4.3.5.3 Plug in Token Stream Converter 4.3.5.4 Parser by Grammar 4.3.5.5 Reducer

LALRGUI

Interface User Controls

4.4.1.1 Terminals User Control The terminals control takes the IGrammar.TerminalArray as input and displays information about the terminals. It includes, the id, name, abbreviation, regular expression, precedence, and associativity. 4.4.1.1.1 Properties ITerminal[] TerminalArray ITerminal CurrentTerminal 4.4.1.1.2 Events EventHandler CurrentTerminalChanged; 4.4.1.2 NonTerminals User Control The nonterminals control takes the IGrammar.NonTerminalArray as input and displays information about the terminals. It includes, the id, name and abbreviation. 4.4.1.2.1 Properties INonTerminal[] NonTerminalArray 4.4.1.3 Rules User Control The rules control takes IGrammar.RuleArray as input and displays the id, left hand side, right hand side abbreviations, and precedence. 4.4.1.3.1 Properties IRule[] RuleArray 4.4.1.4 CFSM User Control The CFSM user control has 3 elements. First is a parse stack that contains a series of states. The next part is the configuration section. The third part is the transition part. When the user double-clicks a reducible configuration, the number of parse states equal to the number of rhs elements are popped off and the new state is shown. If a transition is double-clicked, the new state is placed on the parse stack. 4.4.1.4.1 Properties CFSMState[] StateArray Int[] ParseStack Int ShiftLevel 4.4.1.5 Symbol Stack/Goal Object User Control The symbol stack and goal object user control shows either the goal object after a successful parse or the shift stack after a failure. The inputs are GoalObject and ShiftStack. 4.4.1.5.1 GUI Interactions As a tree view, the user 4.4.1.5.2 Methods void SetGoalObject(ITreeObject goalObject) void SetShiftStack(ITreeObject[] shiftStack, int[] parseStack) 4.4.1.5.3 Properties ITreeObject GoalObject ITreeObject[] ShiftStack int ShiftLevel ShiftLevel is an index into the ShiftStack to determine which ShiftLevel the user has double-clicked. 4.4.1.5.4 Events EventHandler OnShiftLevelChanged When the user double-clicks a CFSM node in a shift stack tree, this even fires so the CFSM can show that corresponding CFSM state. 4.4.1.6 Token Stream Converter Dialog 4.4.1.6.1 Properties ITerminal[] TerminalArray 4.5 Designing a Grammar

Designing Grammars

4.5.1.1 Lang file format 4.5.1.1.1 Configuration for code generation Grammar usings Non-empty lines after line containing "grammar usings" will be placed into the code generated files after the default usings which are Com.Bricologica.Types, and Com.Bricologica.Designer. Grammar namespace ns Sets the namespace that all generated code will be in. ScanStates Non-empty lines following "scanstates" will be scanstates. The first one is the default scanstate. Grammar tokenbase tb Sets the default token base class for when the base class is not specified after : in the Terminals section. Grammar nodebase nb Sets the default node base class for when the base class is not specified after : in the NonTerminals section. Grammar module md Sets the module name which is prepended to elements in generated code. Grammar match mt Sets a tokenbase that is a match-only token base. If a terminal is derived from this class, then the token will still run IToken.Process(…) but no token will be generated by the scanner. 4.5.1.1.2 Configuration for grammar Terminals Non-empty lines following Terminals specify terminals. The format is: name abbrev /regex/ [ScanState] :baseclass [LRN]:[0-9]+ Where each part after name are optional. If abbrev is not included, name is used. If regex is not included, name is converted to a regex. If ScanState is not included, the default scan state is used. If baseclass is not included, the default tokenbase is used. Nonterminals Name abbrev :baseclass [LRN]:[0-9]+ Rules Lhs ([0-9]+): rhs… 4.5.1.2 Parse Results 4.5.1.2.1 On Success 4.5.1.2.2 On Error