Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C -family backends generate incorrect parser #235

Closed
justinmeiners opened this issue Oct 2, 2018 · 4 comments
Closed

C -family backends generate incorrect parser #235

justinmeiners opened this issue Oct 2, 2018 · 4 comments

Comments

@justinmeiners
Copy link
Contributor
justinmeiners commented Oct 2, 2018

Overview

This grammar is used to generate parsers in several languages. The output works fine for Java output, but not for C or C++.
Here is the error when building the C++ output:

screenshot_from_2018-09-21_17-48-19

Observations

I have tracked down the issue to being a mix up of SimpleTypeBool and GroundBool. (similarly for a few other variables)

A Bool is defined to be either true or false and is part of the Ground non terminal.

BoolTrue.   Bool ::= "true" ;
BoolFalse.  Bool ::= "false" ;
GroundBool. Ground ::= Bool ;

SimpleTypeBool this is defined to be the token "Bool"

SimpleTypeBool. SimpleType ::= "Bool" ;

Clearly these are not the same. The token true or false should be parsed as a GroundBool, the token Bool should be parsed as a SimpleTypeBool. However, these cases are mixed in the lex and yacc file.

In the lex file, I can see the token for SimpleTypeBool:

<YYINITIAL>"Bool"        return _SYMB_38;

But, _SYMB_38 is referenced in the ground rule in the bison file:

 Ground : _SYMB_38 {  $$ = new GroundBool($1); $$->line_number = yy_mylinenumber; YY_RESULT_Ground_= $$; 

This is incorrect. _SYMB_38 has nothing to do with Ground. _SYMB_38 should instead be Bool which is defined properly:

Bool : _SYMB_60 {  $$ = new BoolTrue(); $$->line_number = yy_mylinenumber; YY_RESULT_Bool_= $$; }
| _SYMB_50 {  $$ = new BoolFalse(); $$->line_number = yy_mylinenumber; YY_RESULT_Bool_= $$; }

Where the Problem might be

I am not a Haskeller, but I am working on tracking down the problem. My guess is that there is a Map somewhere which uses the same key for a token literal and for a token name.

I found something like that here:
https://github.com/BNFC/bnfc/blob/master/source/src/BNFC/Backend/CPP/NoSTL/CFtoFlex.hs#L58

If I trace env' I can see that there are duplicate keys. Notice Uri.

[("{","_SYMB_0"),("}","_SYMB_1"),("~","_SYMB_2"),("/\\","_SYMB_3"),("\\/","_SYMB_4"),("*","_SYMB_5"),(".","_SYMB_6"),("(","_SYMB_7"),(")","_SYMB_8"),("-","_SYMB_9"),("/","_SYMB_10"),("%%","_SYMB_11"),("+","_SYMB_12"),("++","_SYMB_13"),("--","_SYMB_14"),("<","_SYMB_15"),("<=","_SYMB_16"),(">","_SYMB_17"),(">=","_SYMB_18"),("==","_SYMB_19"),("!=","_SYMB_20"),("=","_SYMB_21"),("|","_SYMB_22"),(",","_SYMB_23"),("_","_SYMB_24"),("@","_SYMB_25"),("bundle+","_SYMB_26"),("bundle-","_SYMB_27"),("<-","_SYMB_28"),(";","_SYMB_29"),("!","_SYMB_30"),("!!","_SYMB_31"),("=>","_SYMB_32"),("[","_SYMB_33"),("]","_SYMB_34"),(":","_SYMB_35"),(",)","_SYMB_36"),("...","_SYMB_37"),("Bool","_SYMB_38"),("ByteArray","_SYMB_39"),("Int","_SYMB_40"),("Nil","_SYMB_41"),("Set","_SYMB_42"),("String","_SYMB_43"),("Uri","_SYMB_44"),("and","_SYMB_45"),("bundle","_SYMB_46"),("bundle0","_SYMB_47"),("contract","_SYMB_48"),("else","_SYMB_49"),("false","_SYMB_50"),("for","_SYMB_51"),("if","_SYMB_52"),("in","_SYMB_53"),("match","_SYMB_54"),("matches","_SYMB_55"),("new","_SYMB_56"),("not","_SYMB_57"),("or","_SYMB_58"),("select","_SYMB_59"),("true","_SYMB_60"),("Long","_SYMB_61"),("Uri","_SYMB_62"),("Var","_SYMB_63")]

I imagine something similar is happening in the bison generation code. I will keep tracking this down, but if you know what is wrong or have any suggestions that would be very helpful. I can also clarify if you have questions.

@justinmeiners justinmeiners changed the title C++ and C backend can generate incorrect output. C++ and C backend generates incorrect output. Oct 2, 2018
@justinmeiners justinmeiners changed the title C++ and C backend generates incorrect output. C++ and C backend generates incorrect parser. Oct 2, 2018
@andreasabel
Copy link
Member
andreasabel commented May 13, 2019

Confirmed, and your hunch where the problem is is correct.

Someone stumbled about a similar issue in the C# backend (see comment in code below), but failed to address the root of the problem.

Left c -> case lookup (show c) env of
-- This used to be x, but that didn't work if we had a symbol "String" in env, and tried to use a normal String - it would use the symbol...
Just x | not (isPositionCat cf c) && (show c) `notElem` (map fst basetypes) -> x
_ -> typeName (identCat c)

The SymEnv map should really keep keywords/symbols and token types apart.

Here is a small test grammar to expose the clashes:

Init.        Main ::= Type Integer Double Char String Ident;

TypeType.    Type ::= "Type"    ;  -- #235 conflict with non-terminal Type
TypeInteger. Type ::= "Integer" ;  -- #235 conflict with token type Integer
TypeDouble.  Type ::= "Double"  ;  -- etc.
TypeChar.    Type ::= "Char"    ;
TypeString.  Type ::= "String"  ;
TypeIdent.   Type ::= "Ident"   ;

@andreasabel andreasabel self-assigned this May 13, 2019
@andreasabel andreasabel added this to the 2.8.3 milestone May 13, 2019
andreasabel added a commit that referenced this issue May 24, 2019
@andreasabel andreasabel modified the milestones: 2.8.3, 2.8.4 Aug 27, 2019
@andreasabel
Copy link
Member

The Ocaml backend is also affected: TOK_Char is used for both keyword "Char" as well as for token type Char.

@andreasabel andreasabel changed the title C++ and C backend generates incorrect parser. C -family backends generate incorrect parser Jan 2, 2020
@justinmeiners
Copy link
Contributor Author

Glad to see this is getting addressed! I wish I was still working with that project so I could help out more.

@andreasabel
Copy link
Member

Thanks for the thumbs up! I fixed this starting yesterday evening, will commit very soon now!

andreasabel added a commit that referenced this issue Oct 3, 2020
Currently do not work for all backends.  Problems with:

* agda
* cpp-nostl
* cpp with namespace
* java -jflex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants