-
Notifications
You must be signed in to change notification settings - Fork 35
GenericLexerTokenChannels
By default generic lexer skip comment tokens and whitespaces tokens.
This allows to not cluttered the grammar with useless data.
But sometimes comments are meaningfull and we want to get them back.
To solve this CSLY borrow the ANTLR channel concept. tokens are never skipped, they simply are sent to different channels :
- Main channel (0) : it is the default channel used by the parser to choose grammar rules
- WhiteSpaces channels (1): this is the channel containing whitespaces tokens
- Comments channel (2) : the channel containing all comments
Channels are defined with an optional parameter on the [Lexeme]
attribute (and all Short code lexeme attributes)
Channels are accessed through Token in production rule visitors. The Token class defines methids to access tokens or other channel :
-
Token Next(int channel)
: token on channelchannel
that immediatly follows the token. -
Token Next(int channel)
: token on channelchannel
that immediatly precedes the token. -
List<Token> NextTokens(int chnnel)
: this list of tokens on channelchannel
that immediatly follow the token. -
List<Token> NextTokens(int chnnel)
: this list of tokens on channelchannel
that immediatly precede the token.
let's define a language that's only consist of a list of IDs that could be commented. Our parser goal is to return the list of ids with a comment metadata if a comment precede the id.
The lexer will only have 2 lexems : an id and a comment.
public enum Lex {
[SingleLineComment("//")]
SINGLELINECOMMENT,
[Lexeme(GenericToken.Identifier, IdentifierType.AlphaNumeric)]
ID
}
the grammar is quite simple as it's a mere list of ids
program : ID*
Our AST is really simple :
public class CommentedId {
public string Id {get; set;}
public string Comment {get; set;}
bool IsCommented => !string.IsNullOrEmpty(Comment);
}
public class CommentedProgram {
public List<CommentedId> Ids {get; set;}
}
And so the parser could be defined as :
public class Parse {
[Production("program : ID*")]
public CommentedProgram Program(List<Token<Lex>> ids)
{
var commentedIds = ids.Select(token =>
{
// get the preceding token for each id
var previous = token.Previous(Channels.Comments);
string comment = null;
// previous token may not be a comment so we have to check
if (previous != null && (previous.TokenID == Lex.SINGLELINECOMMENT))
{
comment = previous?.Value;
}
return new CommentedId()
{
Id = token.Value,
Comment = comment
}
}).ToList();
return new CommentedProgram() { Ids = commentedIds };
}
}