Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List entrypoints reverse parsed lists (Haskell/C) #163

Closed
gapag opened this issue Feb 2, 2016 · 9 comments
Closed

List entrypoints reverse parsed lists (Haskell/C) #163

gapag opened this issue Feb 2, 2016 · 9 comments
Assignees
Labels
bug entrypoints Concerning entry points for the parser and the `entrypoints` directive lists Concerning list categories and separator/terminator/delimiter pragmas
Milestone

Comments

@gapag
Copy link
gapag commented Feb 2, 2016

BNFC release 2.8, and up to the latest commit.
The Java backend is sensitive to the order of the macros/productions.
For instance the following gives errors, both using CUP and ANTLR4:

separator Exp ",";
List . Something ::=  [Exp];
A . Exp ::= "a";
B . Exp ::= "b";

Instead

List . Something ::=  [Exp];
separator Exp ",";
A . Exp ::= "a";
B . Exp ::= "b";

seems to work fine.
I did not try all the other backends, but Haskell seems to work without problem with both formulations.

@gdetrez
Copy link
Contributor
gdetrez commented Feb 4, 2016

I think the problem is not so much the order of the rules but the entry points (wich, if unspecified, will be the first category). Many backends don't seem to support having a list as an entry point.

@andreasabel
Copy link
Member

Maybe entry points could be handled in the general layer rather in the backends.

@andreasabel
Copy link
Member

In the generated cup file I find:

start with [Exp];

ListExp ::= ...

It seems that the start non-terminals are not translated.

@andreasabel andreasabel self-assigned this May 11, 2019
@andreasabel andreasabel added this to the 2.8.3 milestone May 11, 2019
@andreasabel
Copy link
Member
andreasabel commented May 11, 2019

The botched entrypoint in the cup file is easily fixed. (Use identCat instead of show.)
For the OCaml backend, just the generated Test program was faulty.

The Haskell and C backends print a parsed top-level list backwards. (Not an issue with Java and CPP.) The reason is that e.g. Haskell generates left-recursive Happy rules for lists, generating snoc-lists. These are reversed when plugged into other AST nodes, but not at the top-level.
A more principled solution may be to have rules for snoc-lists and then rules for the corresponding cons-lists that just apply the reversal function.

andreasabel added a commit that referenced this issue May 13, 2019
Ocaml: Only the generated Test file was broken.
Java: Only the name of the entrypoint needed to put right (identCat instead of show)
andreasabel added a commit that referenced this issue May 24, 2019
Ocaml: Only the generated Test file was broken.
Java: Only the name of the entrypoint needed to put right (identCat instead of show)
@andreasabel andreasabel modified the milestones: 2.8.3, 2.8.4 Aug 27, 2019
@andreasabel andreasabel changed the title Java backend does not want "separator" or "terminator" as first line List entrypoints reverse parsed lists (Haskell/C) Aug 27, 2019
@andreasabel andreasabel added the lists Concerning list categories and separator/terminator/delimiter pragmas label Nov 23, 2019
@andreasabel
Copy link
Member
andreasabel commented Nov 23, 2019

Reversed list printing in Haskell with this test case:

terminator Exp "";
Lit. Exp ::= Integer ;

I could not reproduce the problem in the simpler setting:

terminator Integer "";

It seems that then the right-to-left recursion transformation does not kick in.

UPDATE: fixed for Haskell by removing the right-to-left recursion transformation, but still open for C.

@andreasabel
Copy link
Member

That left recursion for LR parsing is strictly more performant than right recursion is not a categorical truth.
It is true that left recursion uses O(1) stack and right recursion O(n). However, the AST/parsetree stored in the single cell in case of left recursion is of size O(n) where each stack entry in case of right recursion has size O(1). So it is O(n) no matter which one is uses.

Happy uses a heap-allocated parser stack and BNFC-generated parsers produce ASTs, thus, left recursion does not save anything in comparison to right recursion.

Bison has hard stack limits (10000 if not set otherwise), thus, left-recursion may be safer. For instance, the bnfc -c generated bison parser for input separator [Integer] ""; reports

error: 9999,0: memory exhausted at 9999

We might thus remove the optimization for the Haskell backend, yet keep it for C.

See also section "A few words on performance" of blog http://gallium.inria.fr/blog/lr-lists/

andreasabel added a commit that referenced this issue Nov 23, 2019
Default is only 10.000, which seems an anachronism in 2019.

Parsing right-recursive categories needs O(n) stack size, thus,
YYMAXDEPTH is a hard limit on the depth of right recursion.

BNFC attempts to rewrite right recursion to left recursion, but only
when it can be done easily.  Thus, we might still have right recursion
left in the generated Bison grammar.
andreasabel added a commit that referenced this issue Nov 23, 2019
For LR parsers that allocate the parse stack in the heap, there is only
a minimal difference between left- and right-recursion.  Right recursion
requires O(n) stack in comparison of O(1) stack of left recursion.
However, the ASTs constructed are the same, and their nodes are stored
in the data stack, which has thus the same size for left- and right
recursion.  Only the control stack is different, but the O(n) extra
machine words are dominated by the size of the ASTs.
Asymptotic space complexity (linear) is certainly the same for both
types of recursion.
andreasabel added a commit that referenced this issue Nov 23, 2019
For LR parsers that allocate the parse stack in the heap, there is only
a minimal difference between left- and right-recursion.  Right recursion
requires O(n) stack in comparison of O(1) stack of left recursion.
However, the ASTs constructed are the same, and their nodes are stored
in the data stack, which has thus the same size for left- and right
recursion.  Only the control stack is different, but the O(n) extra
machine words are dominated by the size of the ASTs.
Asymptotic space complexity (linear) is certainly the same for both
types of recursion.

The ocamlyacc parser exhibits stack_overflow exceptions for lists of
e.g. length 1.000.000, no matter whether left or right recursion.
@andreasabel
Copy link
Member

C++ gets this list reversed:

separator Integer "";

@andreasabel
Copy link
Member

ANTLR seems to have a bug blocking this issue for the Java/ANTLR backend: antlr/antlr4#2689
The .g4 parser specification produced by BNFC looks right.

@andreasabel andreasabel added Java Java/ANTLR blocked Blocked by some other issue and removed Java C C++ labels Nov 24, 2019
andreasabel added a commit that referenced this issue Nov 24, 2019
…ormed entrypoints

This fixes the problem with reversed printing of list entrypoints which
have been subjected to the transformation from right to left recursion.
@andreasabel andreasabel removed Java/ANTLR blocked Blocked by some other issue labels Dec 30, 2019
@andreasabel
Copy link
Member

Since #272 is fixed this issue should be resolved completely.

@andreasabel andreasabel added the entrypoints Concerning entry points for the parser and the `entrypoints` directive label Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug entrypoints Concerning entry points for the parser and the `entrypoints` directive lists Concerning list categories and separator/terminator/delimiter pragmas
Projects
None yet
Development

No branches or pull requests

3 participants