Antlr backend - quotation marks in bracket expressions are escaped when they shouldn't #319

fonfalleh · 2020-11-10T19:14:22Z

It seems the only characters that should be escaped in bracket expressions in regexes are ], \, and -. I'm not sure if this means that there needs to be different escaping in different contexts.
https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexer-rule-elements

Example token rule that generates broken code (not by any means good or correct, I just noticed that the resulting lexer file doesn't work) :
token NoteToken ["abcdefgr"]({"es"} | {"is"})*["\',"]*(digit)*["."]* ;
results in the following line in the Lexer.g4 file
NoteToken : [abcdefgr]('e''s'|'i''s')*[\',]*DIGIT*'.'*;
which generates the following when building:
warning(156): lily/lilyLexer.g4:83:38: invalid escape sequence \'

The build also complains about the following line:
STRINGTEXT : ~[\"\\] -> more;

bnfc/source/src/BNFC/Backend/Java/CFtoAntlr4Lexer.hs

Line 157 in 3ca7211

, "STRINGTEXT : ~[\\\"\\\\] -> more;"

The build works as expected when removing the extra backslashes as follows:
NoteToken : [abcdefgr]('e''s'|'i''s')*[',]*DIGIT*'.'*;
...
STRINGTEXT : ~["\\] -> more;

Sidenote:
I first thought this could be related to this line, referencing RegToJLex.hs instead of RegToAntlrLexer.hs, but it seems the reference is correct, even if it's confusing naming.

bnfc/source/src/BNFC/Backend/Java/CFtoAntlr4Lexer.hs

Line 150 in 3ca7211

[ text name <> " : " <> text (printRegJLex exp) <> ";"

Export from RegToAntlrLexer:

bnfc/source/src/BNFC/Backend/Java/RegToAntlrLexer.hs

Line 1 in 3ca7211

module BNFC.Backend.Java.RegToAntlrLexer (printRegJLex, escapeChar) where

The text was updated successfully, but these errors were encountered:

andreasabel · 2020-11-11T10:41:38Z

It seems that the regular expression printer does not apply special printing rules when printing content in bracketed expressions, but maybe it should, according to the rules you quoted above:

The following escaped characters are interpreted as single special characters: \n, \r, \b, \t, \f, \uXXXX, and \u{XXXXXX}. To get ], \, or - you must escape them with \.
(From https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexer-rule-elements)

The problematic lines in BNFC are thus:

bnfc/source/src/BNFC/Backend/Java/RegToAntlrLexer.hs

Line 79 in 3ca7211

RAlts str -> prPrec i 3 (concat [["["],prt 0 str,["]"]])

bnfc/source/src/BNFC/Backend/Java/RegToAntlrLexer.hs

Lines 69 to 72 in 3ca7211

    
           RMinus RAny reg@(RChar _) 
        
                      ->  prPrec i 3 (concat [["~["],prt 0 reg,["]"]]) 
        
           RMinus RAny (RAlts str) 
        
                      ->  prPrec i 3 (concat [["~["],prt 0 str,["]"]])

There, instead of calling the prt function recursively, a special print function for content inside brackets should be called.

andreasabel · 2020-11-12T23:36:21Z

@fonfalleh : Can you test if PR #321 works for you?

fonfalleh · 2020-11-15T17:46:47Z

@fonfalleh : Can you test if PR #321 works for you?

Seems to work, thanks! 👍

andreasabel · 2020-11-17T17:03:59Z

Great!

andreasabel · 2021-01-01T08:20:52Z

My fix wasn't complete, see #329.

andreasabel added Java/ANTLR lexer Concerning the generated lexer bug labels Nov 11, 2020

andreasabel added this to the 2.9 milestone Nov 11, 2020

andreasabel linked a pull request Nov 12, 2020 that will close this issue

[ fixed #319 ] proper escaping in ANTLR-Lexer definition #321

Merged

andreasabel self-assigned this Nov 12, 2020

andreasabel closed this as completed in 404c181 Nov 13, 2020

andreasabel closed this as completed in #321 Nov 13, 2020

andreasabel added a commit that referenced this issue Nov 13, 2020

[ #319 ] regression test

046fd09

andreasabel mentioned this issue Dec 30, 2020

ANTLR backend: Spurious escaping of single quotes in character sets #329

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antlr backend - quotation marks in bracket expressions are escaped when they shouldn't #319

Antlr backend - quotation marks in bracket expressions are escaped when they shouldn't #319

Antlr backend - quotation marks in bracket expressions are escaped when they shouldn't #319

Antlr backend - quotation marks in bracket expressions are escaped when they shouldn't #319

Comments