-
SymbolReferenceSyntaxRule (EqualsToken)
-
SymbolReferenceSyntaxRule (Expression)
This topic explains how to create productions for a non-terminal symbol.
The following topics are prerequisites to understanding this topic:
This topic contains the following sections:
As described in the Non-Terminal Symbols topic, productions are a set of definitions which indicate the various ways a non-terminal symbol can represent a sequence of symbols. Each production body defines the left to right order in which it expects to find a sequence of symbols. There are a few important things to note about productions:
There can be multiple productions with the same head – non-terminals symbols do not have to only represent one sequence of symbols. There can be multiple productions for the same non-terminal symbol. For example, these can be some of the productions for the VariableDeclarationStatement non-terminal symbol in C#:
VariableDeclarationStatement → Type IdentifierToken SemicolonToken
VariableDeclarationStatement → Type IdentifierToken EqualsToken Expression SemicolonToken
VariableDeclarationStatement → VarKeyword IdentifierToken EqualsToken Expression SemicolonToken
Now if any of these sequences of symbols are found in a document in places where a variable declaration statement is expected, that sequence can be represented by the VariableDeclarationStatement non-terminal symbol.
A production body can be empty:
OmittedArgument →
This production is valid and is used by the Visual Basic language. This means if the parser were expecting to find an “OmittedArgument” as the next construct in a document, it could just indictate that it found it in its empty form and the next token actually belongs to the content that was supposed to be found after the “OmittedArgument”. One way to use this might be like so:
OmittedArgument →
Argument → Expression
Argument → OmittedArgument
ArgumentList → OpenParen Argument Comma Argument CloseParen
Now when looking for an argument, if one is missing (which is allowed for VB when the parameter has a default value), the parser can assume that it found the “OmittedArgument” in the place where the “Argument” was expected. So in the argument list “(,)”, there would be two “OmittedArgument” non-terminal symbols found.
A production body can contain its own head symbol as one of its symbols. This can be useful for recursive structures and symbols which contain repeated items. For example, a block of code allows for zero or more statements in C#. If we wanted to define a non-terminal symbol called “Statements” to represent a block, it would be impossible without recursively defined symbols because it would require an infinite number of productions: one for each number of statements allowed. But with recursively defined symbols, it is possible to define the following productions to represent all number of statements (the subscripts on the head symbols are only to facilitate the explanation below and do not affect the symbols or productions):
Statements1 →
Statements2 → Statement Statements
The 2nd production refers to the “Statements” non-terminal symbol in the body. “Statements” is now recursively defined. Now the syntax tree will look like the following based on a various numbers of statements found in a possible block:
If zero statements are found:
Statements1
If one statement is found:
Statements2
Statement
Statements1
If two statements are found:
Statements2
Statement
Statements2
Statement
Statements1
And so on…
As mentioned in Non-Terminal Symbols topic, the following are some of the possible productions for a variable declaration statement in C#:
VariableDeclarationStatement → Type IdentifierToken SemicolonToken
VariableDeclarationStatement → Type IdentifierToken EqualsToken Expression SemicolonToken
VariableDeclarationStatement → VarKeyword IdentifierToken EqualsToken Expression SemicolonToken
Instead of requiring each production to be defined separately, the Syntax Parsing Engine allows for all productions with matching head symbols to be defined together by merging all productions together, like so:
VariableDeclarationStatement →
(Type IdentifierToken [EqualsToken Expression] SemicolonToken)
| (VarKeyword IdentifierToken EqualsToken Expression SemicolonToken)
In this pseudo-production, parentheses group symbols together, square brackets contain optional sequences of symbols, and the bar charater ‘|’ separates two alternatives sequences. This notation describes the same three productions above, but in a more compact fashion.
The pseudo-production in the example above can be described with a tree structure of syntax rules, the root of which is exposed by the Rule property of the head NonTerminalSymbol. Here is what the tree structure would look like in this case:
NonTerminalSymbol.Rule
AlternationSyntaxRule
ConcatenationSyntaxRule
SymbolReferenceSyntaxRule (Type)
SymbolReferenceSyntaxRule (IdentifierToken)
OptionalSyntaxRule
ConcatenationSyntaxRule
SymbolReferenceSyntaxRule (EqualsToken)
SymbolReferenceSyntaxRule (Expression)
SymbolReferenceSyntaxRule (SemicolonToken)
ConcatenationSyntaxRule
SymbolReferenceSyntaxRule (VarKeyword)
SymbolReferenceSyntaxRule (IdentifierToken)
SymbolReferenceSyntaxRule (EqualsToken)
SymbolReferenceSyntaxRule (Expression)
SymbolReferenceSyntaxRule (SemicolonToken)
As you can see the Rule property of the NonTerminalSymbol class indicates to the syntax analyzer the various ways in which the non-terminal symbol can be formed from other symbols. Here is a table of all available syntax rules:
The above tree may be defined using C# in the following way (Symbol implicitly converts to SymbolReferenceSyntaxRule
, so symbols can be specified directly when a rule referencing them is needed):
In C#:
var grammar = new Grammar();
var defaultLexerState = grammar.LexerStates.DefaultLexerState;
// Include the built-in symbols which should be matched in the default lexer state.
defaultLexerState.Symbols.Add(grammar.WhitespaceSymbol);
defaultLexerState.Symbols.Add(grammar.NewLineSymbol);
var equals = defaultLexerState.Symbols.Add("EqualsToken", "=");
var semicolon = defaultLexerState.Symbols.Add("SemicolonToken", ";");
var varKeyword = defaultLexerState.Symbols.Add("VarKeyword", "var");
var identifier = defaultLexerState.Symbols.Add("IdentifierToken",
"[_a-zA-Z][_a-zA-Z0-9]*", TerminalSymbolComparison.RegularExpression);
var type = grammar.NonTerminalSymbols.Add("Type");
type.Rule = identifier; // Implicit conversion
var expression = grammar.NonTerminalSymbols.Add("Expression");
expression.Rule = ...
var variableDeclarationStatement =
grammar.NonTerminalSymbols.Add("VariableDeclarationStatement");
variableDeclarationStatement.Rule =
new AlternationSyntaxRule(
new ConcatenationSyntaxRule(
type, // Implicit conversions
identifier,
new OptionalSyntaxRule(
new ConcatenationSyntaxRule(
equals,
expression
)
),
semicolon
),
new ConcatenationSyntaxRule(
varKeyword,
identifier,
equals,
expression,
semicolon
)
);
grammar.StartSymbol = variableDeclarationStatement;
The following topics provide additional information related to this topic.