Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support block comment delimiters > 2 characters in Haskell backend #202

Open
andreasabel opened this issue Feb 21, 2017 · 5 comments
Open
Assignees
Labels
comments Concerning the "comment" pragma enhancement OCaml
Milestone

Comments

@andreasabel
Copy link
Member
comment "--(" "--)";

reports

Warning: comment delimiters longer than 2 characters ignored in Haskell:
--( - --)
@andreasabel andreasabel added comments Concerning the "comment" pragma enhancement labels Nov 24, 2019
@andreasabel andreasabel self-assigned this Nov 24, 2019
@andreasabel andreasabel added this to the 2.8.4 milestone Nov 24, 2019
andreasabel added a commit that referenced this issue Dec 14, 2019
The Haskell backend now supports block comment delimiters of any length.
@andreasabel
Copy link
Member Author

The C/Java backends (?Lex) did not support more than one block comment form. This has been fixed now by introducing different start conditions COMMENT, COMMENT1, ... for different block comment forms so that they do not get mixed up.

Further, single line comments should not be expected to be terminated by a newline character, as it could also be the EOF character.

andreasabel added a commit that referenced this issue Dec 15, 2019
The C/Java backends (?Lex) did not support more than one block comment
form.  This has been fixed now by introducing different start conditions
COMMENT, COMMENT1, ... for different block comment forms so that they do
not get mixed up.

Further, single line comments should not be expected to be terminated by
a newline character, as it could also be the EOF character.
andreasabel referenced this issue Dec 17, 2019
* Use PrettyPrint
* Allow comment delimiters of more than 2 characters
* Fix #111
* This also fixes some pending OCaml tests
andreasabel added a commit that referenced this issue Dec 17, 2019
See also #108:  /* **/ int x; /* */

Extra care needed to recognize "**/" as end-of-comment.
andreasabel added a commit that referenced this issue Dec 17, 2019
@andreasabel
Copy link
Member Author
andreasabel commented Dec 17, 2019

For Haskell/Java, I implemented in 83681fa a translation from block comment terminator strings to regular expressions that works correctly for C-style comments /* ... */ and HTML-style comments <!-- -->. (The previous solution 799ab5f used a translation that sat there in BNFC.Lexing that could not deal with block comment correctly.)

However, it fails when the terminator string has non-trivial repetitions, e.g., ananas. Here the generated regular expression skips over the terminator e.g. in case of anananas.

Prg. Program ::= [Integer];
terminator Integer "";

comment "banana" "ananas";

Input:

0
banana anananas
1
banana ananas
2

Output:

[Abstract Syntax]
Prg [0,2]

[Linearized tree]
0 2

The correct lexing of anananas would, after scanning anana and seeing nas, jump to have seen anan and looking at as.

This is a well-known problem in recognizing a search word of length m in a text of length n in O(n+m) time, and solutions are implemented in the lexer generators we target. Thus, maybe we should not try to reinvent the wheel.
It seems that going via regular expressions for this problem is anyway expensive, using lexer states or start conditions (such as used for lexing comments with flex and JLex have) might be better also when targeting Alex.

@andreasabel andreasabel reopened this Dec 17, 2019
andreasabel added a commit that referenced this issue Dec 18, 2019
This handles cases beyond comment terminators such as "*/" and "-->".
In the general but unlikely case, a comment terminator may have
non-trivial internal repetitions, like in "ananas".  While lexing
"anananas", we need, after having seen "anana", fall back to state
"ana", to correctly handle the rest "nas" of the input and recognize the
comment terminator.

Caveat: The Ocaml backend cannot handle the general case (like "ananas"
as a comment terminator), since ocamllex gives up on the regex we
create:

  transition table overflow, automaton is too big

Literature:
See the Knuth-Morris-Pratt algorithm of complexity O(n+m) to recognize a
keyword of length m in a text of length n.
(Dragon book second edition section 3.4.5;
Knuth/Morris/Pratt (J. Computing 1997), "Fast pattern matching on
strings").
@andreasabel
Copy link
Member Author

In the end, I did reinvent the wheel and generate a (sometimes huge) regex for recognizing the end of a block comment: 4e57e8e.
The reason is that in Alex, unlike flex, JLex etc., start conditions do not work out of the box, but need the monad wrapper. It seemed more work to change this than solving the puzzle how to generate the regex. (To my surprise, my solution worked on the first try, but I spent hours to formulate it elegantly using foldl/r rather than hacking it in with arrays and indices.)

Caveat: This regex can be too big for ocamllex to handle, which generates automata with max 2^15 transitions. It might be a problem with ocamllex as the resulting DFA should be small. Also, there might be a better way to do this with ocamllex.

Alex can a bit slow turning this regex into a DFA. I wonder whether there are faster algorithms that can deal with large regexes; the intermediate NFA Alex creates seems huge.

@andreasabel
Copy link
Member Author
andreasabel commented Oct 8, 2020

The remaining problem (see #280) is in the OCaml backend: ocamllex cannot stomach the generated regexes, creating too big automata. Filed bug at ocaml/ocaml#9964.

Alex could be faster, reported haskell/alex#163.

@andreasabel
Copy link
Member Author

Xavier Leroy recommends to use ocamllex states: ocaml/ocaml#9964 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comments Concerning the "comment" pragma enhancement OCaml
Projects
None yet
Development

No branches or pull requests

1 participant