-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support block comment delimiters > 2 characters in Haskell backend #202
Comments
The Haskell backend now supports block comment delimiters of any length.
The C/Java backends (?Lex) did not support more than one block comment form. This has been fixed now by introducing different start conditions COMMENT, COMMENT1, ... for different block comment forms so that they do not get mixed up. Further, single line comments should not be expected to be terminated by a newline character, as it could also be the EOF character. |
The C/Java backends (?Lex) did not support more than one block comment form. This has been fixed now by introducing different start conditions COMMENT, COMMENT1, ... for different block comment forms so that they do not get mixed up. Further, single line comments should not be expected to be terminated by a newline character, as it could also be the EOF character.
* Use PrettyPrint * Allow comment delimiters of more than 2 characters * Fix #111 * This also fixes some pending OCaml tests
See also #108: /* **/ int x; /* */ Extra care needed to recognize "**/" as end-of-comment.
For Haskell/Java, I implemented in 83681fa a translation from block comment terminator strings to regular expressions that works correctly for C-style comments However, it fails when the terminator string has non-trivial repetitions, e.g.,
Input:
Output:
The correct lexing of This is a well-known problem in recognizing a search word of length m in a text of length n in O(n+m) time, and solutions are implemented in the lexer generators we target. Thus, maybe we should not try to reinvent the wheel. |
This handles cases beyond comment terminators such as "*/" and "-->". In the general but unlikely case, a comment terminator may have non-trivial internal repetitions, like in "ananas". While lexing "anananas", we need, after having seen "anana", fall back to state "ana", to correctly handle the rest "nas" of the input and recognize the comment terminator. Caveat: The Ocaml backend cannot handle the general case (like "ananas" as a comment terminator), since ocamllex gives up on the regex we create: transition table overflow, automaton is too big Literature: See the Knuth-Morris-Pratt algorithm of complexity O(n+m) to recognize a keyword of length m in a text of length n. (Dragon book second edition section 3.4.5; Knuth/Morris/Pratt (J. Computing 1997), "Fast pattern matching on strings").
In the end, I did reinvent the wheel and generate a (sometimes huge) regex for recognizing the end of a block comment: 4e57e8e. Caveat: This regex can be too big for Alex can a bit slow turning this regex into a DFA. I wonder whether there are faster algorithms that can deal with large regexes; the intermediate NFA Alex creates seems huge. |
The remaining problem (see #280) is in the OCaml backend: Alex could be faster, reported haskell/alex#163. |
Xavier Leroy recommends to use ocamllex states: ocaml/ocaml#9964 (comment) |
reports
The text was updated successfully, but these errors were encountered: