Introduction
This section explains how to use the parts of the LALRLib, the GUI, the parse stepper, and the command line tool.
These are all available at bricologica.com or at github.com. They are licensed under GPLv3. The build files are for Microsoft Visual Studio Express C#.
Deliverables
This section lists the deliverables for the LALRLib package.
The Library
LALRLib.dll contains the code described below.
LALRGUI
LALRGUI is a GUI application that uses the Designer classes to let the user run a grammar.
Parse Stepper
The parser stepper is GUI application lets a user step through each parse step of a grammar and inspect the parse and push stacks. The user can go forward and backward.
Code Generators
The lalrgen.exe is command line application that generates serializable files created by the generators. This allows a user to skip the time consuming process of generating the tables. It can also generate the code to make the table as a compile time object.
Documentation
This PDF.
LALRLib
LALRLib is the workhorse of the work I've done. It contains everything needed to create LALR scanners and parsers.
The top level is LALRManager which aggregates the components needed to parse a file—the grammar, scanner, converter, reducer, and parser. It is an abstract class, so the user must create a class with their more specific operations. An example is the Designer which supports a user in the design a new grammar, converter, and reducer.
The Manager Template
The top level of the library is the LALRManager
. Every component is publicly available, but the LALRManager is a convenient way to manage the whole process.
The three generics are user defined objects derived from TokenCreatorBase, ConverterBase, and ReducerBase, respectively. There is a set of Designer classes that can be used here. Applications that use only the Designer versions may simply use the Designer class which derives from the LALRManager.
Token Creator
The token creator contains a function called Process that for each terminal type the processes the token. Process takes an IContext object, the terminal ID, and a ScanStateManager object and uses these to create a new token which comes in. The Process function may do any of the following:
- Read the matched string to calculate literals
- Alters the scan state
- Nulls out the token so that no token is returned
- Alters the token (e.g. turn and 'int' into a 'long' based of the length of the string.)
- Uses or updates context
- Determine if certain keywords are actually identifiers
- Determine if certain identifiers are actually keywords
- Anything else that needs to be done
Converter
The converter is a module with the same interface as the scanner. It sits between the scanner and parser and contains modules that look for the start of certain patterns in the token stream which may indicate a conversion is necessary. If a pattern starts, it starts to pull in tokens until the pattern is completed or it is determined that the pattern doesn't match the expected pattern.
If a match is found, the converter can manipulate the tokens it examined, doing substitutions or adding delimiters. The converters deliver up tokens from this token queue.
Reducer
The reducer is an object with one function: Reduce. Reduce takes the rule ID and an array of symbols to reduce by that rule.
4.3.2 Manager Elements
Grammar
The grammar object holds the terminals, non-terminals, and rules. It is configurable from a .lang file or a serializable GrammarInfo object, but does not require any programming.
Scanner
The scanner object hold the scanner. It can be used as a standalone scanner or will automatically feed into the converter or parser. It is configurable by either an IGrammar or by the serializable ScannerInfo which holds the ScanState indexed ScanTables.
Parser
The parser object holds the parser. It can be used as a standalone parser or will automatically pull tokens from the scanner or converter if it exists. It is configurable by either an IGrammar or by the serializable ParseTable.
User Written Manager Elements
4.3.2.4.1 TokenCreatorBase
A user derived token creator is needed to create scanned tokens. A token type is determined from ITerminal and may be a user defined TokenBase derived class. The token creator simply contains an array of Factory objects which are Factory objects that create the tokens. This has to be a derived class because the Factory is a compile time operation. ITerminal.Id (IToken.Symbol.Id) is the index.
4.3.2.4.2 Factory
The factory creates a token of type T from the generic constructor. T must be derived from TokenBase. There is an abstract function Process which takes all the information from the scanner and initializes the token. Process returns a token. Usually it simply returns itself, but it can also return null if the token should be passed to the parser or a different token such as a more specific token determined from the scanned text.
4.3.2.4.3 ConverterBase
Converters intercept tokens on the way from the scanner to the parser and do any manipulations that need to be done to support parsing. This is part of the design of the grammar, so this will require user created code. It is possible to have multiple converters cascaded together with a higher level converter that contains the cascaded converters.
4.3.2.4.4 ReducerBase
The user derived reducer is the object that reduces a rule using tree objects that may be user-defined TokenBase or NodeBase derived classes and returns a node that may be a user defined NodeBase. There is just one function that takes an IRule, the right hand side as an array, and the context object.
Designer Elements
4.3.3.1 GrammarGenerator
The GrammarGenerator is exposed by the Designer. It takes lines that represent a grammar and converts them to a GrammarInfo.
4.3.3.1.1 Generation
GrammarInfo GrammarGenerator.Generate(String[] lines);
4.3.3.1.2 GrammarInfo
GrammarInfo is a serializable class that results from a GrammarGenerator and is sufficient to initialize a Grammar object.
4.3.3.2 ScannerGenerator
The ScannerGenerator is exposed by the Designer. It takes a Grammar and calculates the ScannerInfo which is a ScanState indexed set of ScanTables.
4.3.3.2.1 Generation
ScannerInfo ScannerGenerator.Generate(String[] lines);
4.3.3.2.2 ScannerInfo
ScannerInfo is a serializable class that results from ScannerGenertor and is sufficient to initialize a Scanner object.
4.3.3.3 Parser Generator
The ParserGenerator is exposed by the Designer. It takes a Grammar and calculates the ParseTable.
4.3.3.3.1 Generation
ParseTable ParserGenerator.Generate(String[] lines);
4.3.3.3.2 ParseTable
ParseTable is a serializable class that results from ParserGenerator and is sufficient to initialize a Parser object. Using a serialized version of the class can be significantly faster than generating the ParseTable from a Grammar object.
4.3.3.4 TokenBase
User tokens are derived from TokenBase. TokenBase is possibly the root of a gradually derived token class tree. Extra methods and properties can be added to aid in their use in reduction and usage in whatever purpose the parse tree will be used for (data of literals, code generation, search and manipulation).
4.3.3.5 NodeBase
User nodes are derived from NodeBase. NodeBase is possibly the root of a gradually derived token class tree. Extra methods and properties can be added to aid in their use in reduction and usage in whatever purpose the parse tree will be used for (code generation, search and manipulation).
Converter
The DesignerConverter takes a DLL name and the name of a class in that DLL and creates an instance of that in an AppDomain. This allows the DLL be unloaded for iterative design.
Converter
Plugin
The Designer converter uses an AppDomain to load in the converter. This allows unloading when the user wants to recompile for iterative grammar design and troubleshooting.
4.3.4.2.1 How it Works
4.3.4.2.2 LiteToken
A LiteToken contains only the UniqueId, the Abbrev, and the SymbolId. This is to make fast transport across the AppDomain boundary, but if the converters will be used in the Designer, they can only use these three pieces of information to make decisions.
4.3.4.2.3 Host
The host creates the AppDomain and loads the client. It then manages communication to the client which actually controls the loaded ConverterBase derived class in the loaded DLL.
bool Load(String path, String clientTypeName, String[] abbrevArray);
The load function loads the DLL at 'path' and creates an instance of the class whose fully qualified type name is found in the clientTypeName. The array of abbreviations is how the DLL will be able to tie the abbreviation to the SymbolId.
void Unload()
The unload function unloads the AppDomain. It allows recompile of the DLL.
4.3.4.2.4 Client
The client is the MarshalByRefObject derived object that straddles the AppDomain.
LALRSerializationContext
When serializing, it is necessary to pass the LALRSerializationContext to the Deserialize function. Some objects need temporary stores on this object to reconstruct themselves. The user is not required to add anything to this, though they may if they plan on making a larger serializer.
ContextBase
Both token creation and reduction may make use of a context object supplied by the user. This is optional, but any context object passed to IParser.Parse(…) or ITokenizable.Next(…) should be derived from ContextBase.
Designer
4.3.5.1 Grammar by File
4.3.5.2 Scanner by Grammar
4.3.5.3 Plug in Token Stream Converter
4.3.5.4 Parser by Grammar
4.3.5.5 Reducer
LALRGUI
Interface User Controls
4.4.1.1 Terminals User Control
The terminals control takes the IGrammar.TerminalArray as input and displays information about the terminals. It includes, the id, name, abbreviation, regular expression, precedence, and associativity.
4.4.1.1.1 Properties
ITerminal[] TerminalArray
ITerminal CurrentTerminal
4.4.1.1.2 Events
EventHandler CurrentTerminalChanged;
4.4.1.2 NonTerminals User Control
The nonterminals control takes the IGrammar.NonTerminalArray as input and displays information about the terminals. It includes, the id, name and abbreviation.
4.4.1.2.1 Properties
INonTerminal[] NonTerminalArray
4.4.1.3 Rules User Control
The rules control takes IGrammar.RuleArray as input and displays the id, left hand side, right hand side abbreviations, and precedence.
4.4.1.3.1 Properties
IRule[] RuleArray
4.4.1.4 CFSM User Control
The CFSM user control has 3 elements. First is a parse stack that contains a series of states. The next part is the configuration section. The third part is the transition part. When the user double-clicks a reducible configuration, the number of parse states equal to the number of rhs elements are popped off and the new state is shown. If a transition is double-clicked, the new state is placed on the parse stack.
4.4.1.4.1 Properties
CFSMState[] StateArray
Int[] ParseStack
Int ShiftLevel
4.4.1.5 Symbol Stack/Goal Object User Control
The symbol stack and goal object user control shows either the goal object after a successful parse or the shift stack after a failure. The inputs are GoalObject and ShiftStack.
4.4.1.5.1 GUI Interactions
As a tree view, the user
4.4.1.5.2 Methods
void SetGoalObject(ITreeObject goalObject)
void SetShiftStack(ITreeObject[] shiftStack, int[] parseStack)
4.4.1.5.3 Properties
ITreeObject GoalObject
ITreeObject[] ShiftStack
int ShiftLevel
ShiftLevel is an index into the ShiftStack to determine which ShiftLevel the user has double-clicked.
4.4.1.5.4 Events
EventHandler OnShiftLevelChanged
When the user double-clicks a CFSM node in a shift stack tree, this even fires so the CFSM can show that corresponding CFSM state.
4.4.1.6 Token Stream Converter Dialog
4.4.1.6.1 Properties
ITerminal[] TerminalArray
4.5 Designing a Grammar
Designing Grammars
4.5.1.1 Lang file format
4.5.1.1.1 Configuration for code generation
Grammar usings
Non-empty lines after line containing "grammar usings" will be placed into the code generated files after the default usings which are Com.Bricologica.Types, and Com.Bricologica.Designer.
Grammar namespace ns
Sets the namespace that all generated code will be in.
ScanStates
Non-empty lines following "scanstates" will be scanstates. The first one is the default scanstate.
Grammar tokenbase tb
Sets the default token base class for when the base class is not specified after : in the Terminals section.
Grammar nodebase nb
Sets the default node base class for when the base class is not specified after : in the NonTerminals section.
Grammar module md
Sets the module name which is prepended to elements in generated code.
Grammar match mt
Sets a tokenbase that is a match-only token base. If a terminal is derived from this class, then the token will still run IToken.Process(…) but no token will be generated by the scanner.
4.5.1.1.2 Configuration for grammar
Terminals
Non-empty lines following Terminals specify terminals. The format is:
name abbrev /regex/ [ScanState] :baseclass [LRN]:[0-9]+
Where each part after name are optional.
If abbrev is not included, name is used. If regex is not included, name is converted to a regex. If ScanState is not included, the default scan state is used. If baseclass is not included, the default tokenbase is used.
Nonterminals
Name abbrev :baseclass [LRN]:[0-9]+
Rules
Lhs ([0-9]+): rhs…
4.5.1.2 Parse Results
4.5.1.2.1 On Success
4.5.1.2.2 On Error