-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #176: Store node position in the AST (Haskell backend) #327
Conversation
Great, @Commelina to work on this dearly needed feature! I build your PR and ran the
Unfortunately, it seems to. Please have a look at the following tests: Regression tests:
Examples:
Hint: in bnfc/testing/src/ParameterizedTests.hs Lines 332 to 333 in e7e68ba
If you then make in the testing directory, the relevant tests run first (the whole testsuite takes 1 hour).
|
Hi, I have rewritten the module based on the latest code from the Then I ran the Also, I tested it on my own projects and it works well. |
9894e58
to
8e790b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Commelina, this version is very good!
I noticed some problems with String
literals (testcase 235_...
), see below.
Further, please avoid to introduce code duplication. (The codebase of BNFC unfortunately suffers a lot from code duplication. My goal is to get rid of duplication as much as possible.) Maybe you can even find opportunities to reduce duplication in CFtoHappy
in the places you modify.
[] -> "Nothing" | ||
(Left _:_) -> "fst $1" | ||
(Right _:_) -> "Just (tokenLineCol $1)" | ||
getFunctorValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getFunctorPos
duplicates code of action
. Please reformulate this hunk so it does not duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it refers to getFunctorValue
...? But indeed, the code needs refactoring.
After refactoring, the hunk contains three core functions actionPos
, actionValue
and action
. And now actionValue
can handle both situations with/without functor so there is less duplication. actionPos
gets position information and action
just composes them in different forms depends on functor
:
action
| functor = let pos = actionPos
val = actionValue
in "(" ++ pos ++ ", " ++ val ++ ")"
| otherwise = actionValue
actionPos = case rhs of
[] -> "Nothing"
(Left _:_) -> "fst $1"
(Right _:_) -> "Just (tokenLineCol $1)"
actionValue
| isCoercion fun = unwords metavars
| isNilCons fun = unwords (qualify fun : metavars)
| functor = unwords (qualify fun : ("(" ++ actionPos ++ ")") : metavars)
| otherwise = unwords (qualify fun : metavars)
mkPosPart = if functor then "Just (tokenLineCol $1)" else "" | ||
mkValPart tokenCat = | ||
case tokenCat of | ||
"String" -> if functor then stringUnpack "(tokenText $1)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testcase 235_SymbolsOverlapTokens
is broken: String
literals now arrive quoted in the abstract syntax. This is due to the use of tokenText
here, I suppose. Note that the generated tokenText
for String
tokens uses the show
function, which adds a level of quotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is my mistake. Indeed, the show
function in tokenTest
adds an extra level of quotation to String
literals.
The simplest and clearest way I can find out is to replace the tokenText
function with a partial one (\(PT _ (TL s)) -> s)
:
"String" -> if functor then stringUnpack "((\\(PT _ (TL s)) -> s) $1)"
else stringUnpack "$1"
Although the function is partial, there will never be a Tok
other than TL s
when "String"
is matched. So it is safe, I think.
(And I finally find out why my testing gave me unexpected results because I forgot to replace bnfc
with the latest build :( )
(Left _:_) -> "fst $1" | ||
(Right _:_) -> "Just (tokenLineCol $1)" | ||
getFunctorValue | ||
| isCoercion fun = unwords metavars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates some code of action
. Please refactor so that no code is duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried resolving this problem in the first conversation (#327 (comment)).
@Commelina thanks for the fixes! The testsuite passes now. We can now look at the user experience (UX). I tried
Then I looked at the products to discover how to use the new feature.
Issues with the generated parser:
From a long term maintenance perspective, making the right choices concerning the user interface to the new I see the status of this PR as "can be merged", but further work is needed to smoothen the UX. |
Thanks for your detailed reply! The items you listed are worthy of improvement, and I'd like to continue here. However, I am afraid that I do not have enough time in the following few days (maybe it will be better after a few weeks). In my personal opinion, to avoid conflicts in the future, I'd like to merge it now and improve it later. I will improve them as soon as I have time. |
Ok, I will try to release 2.9.0 soon, then merge this PR, then look at UX design. |
This is PR #327. Effects and TODO (Andreas): #327 (comment) We can now look at the user experience (UX). I tried `--functor` on a tiny grammar for Hutton's razor: ``` EPlus. Exp ::= Exp "+" Exp1; EInt. Exp1 ::= Integer; ``` Then I looked at the products to discover how to use the new feature. - `Test.hs` is unchanged w.r.t. the version without `--functor`. This is good. Yet it did not give me any hints how to use positions. - `Abs.hs` places an additional parametric value at each AST constructor ```haskell data Exp a = EInt a Integer | EPlus a (Exp a) (Exp a) deriving (C.Eq, C.Ord, C.Show, C.Read) instance C.Functor Exp where ``` I think the `Functor` instance can and should now be automatically generated by GHC using the `DeriveFunctor` extension. Also, `Foldable` and `Traversable` can be derived. The file `Abs.hs` should give me some clues about the positions that the parser delivers. I'd favor something along ```haskell data Position = Position { startLine :: Int, startColumn :: Int } data Exp' a = EInt a Integer | EPlus a (Exp' a) (Exp' a) ... type Exp = Exp' Position ``` - `Par.y`: ```haskell %name pExp_internal Exp %name pExp1_internal Exp1 ... %% Integer :: { (Maybe (Int, Int), Integer) } Integer : L_integ { (Just (tokenLineCol $1), (read ((tokenText $1))) :: Integer) } Exp :: { (Maybe (Int, Int), (Razor.Abs.Exp (Maybe (Int, Int))) ) } Exp : Exp '+' Exp1 { (fst $1, Razor.Abs.EPlus (fst $1) (snd $1) (snd $3)) } Exp1 :: { (Maybe (Int, Int), Razor.Abs.Exp (Maybe (Int, Int)) ) } Exp1 : Integer { (fst $1, Razor.Abs.EInt (fst $1) (snd $1)) } { ... myLexer = tokens pExp = (>>= return . snd) . pExp_internal pExp1 = (>>= return . snd) . pExp1_internal } ``` Issues with the generated parser: - Concerning the last lines: The simpler code is ```haskell pExp = fmap snd . pExp_internal ``` - `pExp` etc. should have a type signature - There should be a comment to point out the entrypoints at the end of the file (for better UX). - (Minor issue, not sure: `pExp_internal`: better name scheme would be `pExpWithPosition` etc.) - The `Maybe (Int, Int)` shouldn't be duplicated everywhere. See comment on `Abs.hs` how to factor out a type of `Position` and types for AST trees with position information. - Can we get `Position` instead of `Maybe Position`? I see the `Nothing` comes from the empty list case, but maybe there we could return the position of the following token?
@Commelina : Your PR has been merged as commit 6d57125. |
I implemented something like this, but I have second thoughts because of generativity. If you have two grammars, BNFC will create two
does not have these problems. |
Functor as well as Foldable and Traversable can be derived using the proper LANGUAGE extensions.
Also: parameterized version of AST is primed, unprimed is position-carrying: type Exp = Exp' BNFC'Position data Exp' a = EPlus (Exp' a) (Exp' a) ...
Also abbreviate 'Either String' to 'Err' in generated happy file.
If one runs `bnfc --functor` on two grammars, one does not want two different (yet isomorophic) definitions of the position type (generativity problem). As type synonyms expand to their definition, it is better to use Maybe(Int,Int) as position type, and only define a synonym.
Most of the code is taken and reorganized from the unmerged branch https://github.com/BNFC/bnfc/compare/176-source-position. I tested it on all three token types with/without functor. But I am not sure if it overrides some new features when generating Happy file.