Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antlr backend - quotation marks in bracket expressions are escaped when they shouldn't #319

Closed
fonfalleh opened this issue Nov 10, 2020 · 5 comments · Fixed by #321
Closed
Assignees
Labels
bug Java/ANTLR lexer Concerning the generated lexer
Milestone

Comments

@fonfalleh
Copy link

It seems the only characters that should be escaped in bracket expressions in regexes are ], \, and -. I'm not sure if this means that there needs to be different escaping in different contexts.
https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexer-rule-elements

Example token rule that generates broken code (not by any means good or correct, I just noticed that the resulting lexer file doesn't work) :
token NoteToken ["abcdefgr"]({"es"} | {"is"})*["\',"]*(digit)*["."]* ;
results in the following line in the Lexer.g4 file
NoteToken : [abcdefgr]('e''s'|'i''s')*[\',]*DIGIT*'.'*;
which generates the following when building:
warning(156): lily/lilyLexer.g4:83:38: invalid escape sequence \'

The build also complains about the following line:
STRINGTEXT : ~[\"\\] -> more;

, "STRINGTEXT : ~[\\\"\\\\] -> more;"

The build works as expected when removing the extra backslashes as follows:
NoteToken : [abcdefgr]('e''s'|'i''s')*[',]*DIGIT*'.'*;
...
STRINGTEXT : ~["\\] -> more;


Sidenote:
I first thought this could be related to this line, referencing RegToJLex.hs instead of RegToAntlrLexer.hs, but it seems the reference is correct, even if it's confusing naming.

[ text name <> " : " <> text (printRegJLex exp) <> ";"

Export from RegToAntlrLexer:

module BNFC.Backend.Java.RegToAntlrLexer (printRegJLex, escapeChar) where

@andreasabel andreasabel added Java/ANTLR lexer Concerning the generated lexer bug labels Nov 11, 2020
@andreasabel
Copy link
Member
andreasabel commented Nov 11, 2020

It seems that the regular expression printer does not apply special printing rules when printing content in bracketed expressions, but maybe it should, according to the rules you quoted above:

The following escaped characters are interpreted as single special characters: \n, \r, \b, \t, \f, \uXXXX, and \u{XXXXXX}. To get ], \, or - you must escape them with \.
(From https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexer-rule-elements)

The problematic lines in BNFC are thus:

RAlts str -> prPrec i 3 (concat [["["],prt 0 str,["]"]])

RMinus RAny reg@(RChar _)
-> prPrec i 3 (concat [["~["],prt 0 reg,["]"]])
RMinus RAny (RAlts str)
-> prPrec i 3 (concat [["~["],prt 0 str,["]"]])

There, instead of calling the prt function recursively, a special print function for content inside brackets should be called.

@andreasabel andreasabel added this to the 2.9 milestone Nov 11, 2020
@andreasabel andreasabel linked a pull request Nov 12, 2020 that will close this issue
@andreasabel
Copy link
Member

@fonfalleh : Can you test if PR #321 works for you?

@andreasabel andreasabel self-assigned this Nov 12, 2020
andreasabel added a commit that referenced this issue Nov 13, 2020
@fonfalleh
Copy link
Author

@fonfalleh : Can you test if PR #321 works for you?

Seems to work, thanks! 👍

@andreasabel
Copy link
Member

Great!

@andreasabel
Copy link
Member

My fix wasn't complete, see #329.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Java/ANTLR lexer Concerning the generated lexer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants