This topic explains the lexical errors found by the Syntax Parsing Engine.
The following topics are prerequisites to understanding this topic:
This topic contains the following sections:
Lexical errors are detected relatively easily and the lexical analyzer recovers from them easily as well. There is really only one type of lexical error: none of the terminal symbols in the current lexer state can represent the text at the current location.
Consider a grammar with a default lexer state defined as having the following terminal symbols:
NewLineSymbol – matches one new line, which is either “\r”, “\n”, or “\r\n”
WhitespaceSymbol – matches one or more space or tab characters.
Identifier – matches an underscore or letter followed by zero or more underscores, letters, or digits.
Document content to be parsed: “x += y;”
While the lexer is analyzing this document content it will first create tokens <Identifier, “x”> and <WhitespaceToken, “ ”>. Then it will come to the “+” character and no terminal symbol will be able to recognize it. The lexer will continue reading unrecognized characters until it comes to a piece of text which can be matched by any terminal symbol. In this case, that is the space after the “=”. The lexer will then gather up all contiguous unrecognized characters and make them a match of the special symbol exposed by Grammar.UnrecognizedSymbol. So the full token stream for this document content will be the following:
<Identifier, “x”> <WhitespaceToken, “ ”> <UnrecognizedToken, “+=”> <WhitespaceToken, “ ”> <Identifier, “y”> <UnrecognizedToken, “;”> <EndOfStreamToken>
The following topics provide additional information related to this topic.