-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Using functor to store position information #176
Comments
If I read the bnfc sources correctly source positions are not supported by the happy backend of bnfc. Actually the lexer generated by alex yields a list of tokens which are annotated with their source positions, but the happy parser ignores them. |
@pascalh is correct, the help comment describe the intended behavior but I still need to find time to complete the feature. The functor part works but not the position part... Sorry for the inconsistent documentation 😕. |
Any updates on this @gdetrez ? |
@Centril partly: I have done some work on this but I was interrupted by other things... I'll try to rebase and push what I already have this week and see if I can finish it soon-ish. Would you be able to provide some feedback? |
@gdetrez Sure, what type of feedback do you want? Peer review or testing (or both)? Just give me a link to the commit at which point to test it |
When using bnfc to generate haskell code with the --functor option, the parser will no store the begining position of the corresponding code in each node. This might be useful, for instance, to report errors. Note that this is possibly a larger change than it had to be because I tried something new: instead of generating code by putting together strings (or the slightly fancier PrettyPrint.Doc), I refactored the code that makes happy production to generate an AST which is then pretty-printed. Both the AST types and the pretty printer being of course generated by BNFC itself! This seems like an obvious thing to do but for some reason we never did. The main advantage is that it makes much better use of the type checker (whene everything is a string, it doesn't mean much that the type checking passes...). In addition, you get much cleaner code by getting rid of all the separators, keywords, symbols and parentheses that are now inserted by the pretty printer.
When using bnfc to generate haskell code with the --functor option, the parser will no store the begining position of the corresponding code in each node. This might be useful, for instance, to report errors. Note that this is possibly a larger change than it had to be because I tried something new: instead of generating code by putting together strings (or the slightly fancier PrettyPrint.Doc), I refactored the code that makes happy production to generate an AST which is then pretty-printed. Both the AST types and the pretty printer being of course generated by BNFC itself! This seems like an obvious thing to do but for some reason we never did. The main advantage is that it makes much better use of the type checker (whene everything is a string, it doesn't mean much that the type checking passes...). In addition, you get much cleaner code by getting rid of all the separators, keywords, symbols and parentheses that are now inserted by the pretty printer.
Is there any chance of getting this merged? Can I help somehow? |
When using bnfc to generate haskell code with the --functor option, the parser will no store the begining position of the corresponding code in each node. This might be useful, for instance, to report errors. Note that this is possibly a larger change than it had to be because I tried something new: instead of generating code by putting together strings (or the slightly fancier PrettyPrint.Doc), I refactored the code that makes happy production to generate an AST which is then pretty-printed. Both the AST types and the pretty printer being of course generated by BNFC itself! This seems like an obvious thing to do but for some reason we never did. The main advantage is that it makes much better use of the type checker (whene everything is a string, it doesn't mean much that the type checking passes...). In addition, you get much cleaner code by getting rid of all the separators, keywords, symbols and parentheses that are now inserted by the pretty printer.
When using bnfc to generate haskell code with the --functor option, the parser will no store the begining position of the corresponding code in each node. This might be useful, for instance, to report errors. Note that this is possibly a larger change than it had to be because I tried something new: instead of generating code by putting together strings (or the slightly fancier PrettyPrint.Doc), I refactored the code that makes happy production to generate an AST which is then pretty-printed. Both the AST types and the pretty printer being of course generated by BNFC itself! This seems like an obvious thing to do but for some reason we never did. The main advantage is that it makes much better use of the type checker (whene everything is a string, it doesn't mean much that the type checking passes...). In addition, you get much cleaner code by getting rid of all the separators, keywords, symbols and parentheses that are now inserted by the pretty printer.
When using bnfc to generate haskell code with the --functor option, the parser will no store the begining position of the corresponding code in each node. This might be useful, for instance, to report errors. Note that this is possibly a larger change than it had to be because I tried something new: instead of generating code by putting together strings (or the slightly fancier PrettyPrint.Doc), I refactored the code that makes happy production to generate an AST which is then pretty-printed. Both the AST types and the pretty printer being of course generated by BNFC itself! This seems like an obvious thing to do but for some reason we never did. The main advantage is that it makes much better use of the type checker (whene everything is a string, it doesn't mean much that the type checking passes...). In addition, you get much cleaner code by getting rid of all the separators, keywords, symbols and parentheses that are now inserted by the pretty printer.
Any news on this? This would be a very useful feature! |
Seems like this feature sits on an unmerged branch https://github.com/BNFC/bnfc/compare/176-source-position which my predecessor left behind. After three years of course it does not merge cleanly any more. I don't know if there was a reason this was not finished. I'd be happy for someone to pick it up and integrate it with the current source of BNFC. |
This is PR #327. Effects and TODO (Andreas): #327 (comment) We can now look at the user experience (UX). I tried `--functor` on a tiny grammar for Hutton's razor: ``` EPlus. Exp ::= Exp "+" Exp1; EInt. Exp1 ::= Integer; ``` Then I looked at the products to discover how to use the new feature. - `Test.hs` is unchanged w.r.t. the version without `--functor`. This is good. Yet it did not give me any hints how to use positions. - `Abs.hs` places an additional parametric value at each AST constructor ```haskell data Exp a = EInt a Integer | EPlus a (Exp a) (Exp a) deriving (C.Eq, C.Ord, C.Show, C.Read) instance C.Functor Exp where ``` I think the `Functor` instance can and should now be automatically generated by GHC using the `DeriveFunctor` extension. Also, `Foldable` and `Traversable` can be derived. The file `Abs.hs` should give me some clues about the positions that the parser delivers. I'd favor something along ```haskell data Position = Position { startLine :: Int, startColumn :: Int } data Exp' a = EInt a Integer | EPlus a (Exp' a) (Exp' a) ... type Exp = Exp' Position ``` - `Par.y`: ```haskell %name pExp_internal Exp %name pExp1_internal Exp1 ... %% Integer :: { (Maybe (Int, Int), Integer) } Integer : L_integ { (Just (tokenLineCol $1), (read ((tokenText $1))) :: Integer) } Exp :: { (Maybe (Int, Int), (Razor.Abs.Exp (Maybe (Int, Int))) ) } Exp : Exp '+' Exp1 { (fst $1, Razor.Abs.EPlus (fst $1) (snd $1) (snd $3)) } Exp1 :: { (Maybe (Int, Int), Razor.Abs.Exp (Maybe (Int, Int)) ) } Exp1 : Integer { (fst $1, Razor.Abs.EInt (fst $1) (snd $1)) } { ... myLexer = tokens pExp = (>>= return . snd) . pExp_internal pExp1 = (>>= return . snd) . pExp1_internal } ``` Issues with the generated parser: - Concerning the last lines: The simpler code is ```haskell pExp = fmap snd . pExp_internal ``` - `pExp` etc. should have a type signature - There should be a comment to point out the entrypoints at the end of the file (for better UX). - (Minor issue, not sure: `pExp_internal`: better name scheme would be `pExpWithPosition` etc.) - The `Maybe (Int, Int)` shouldn't be duplicated everywhere. See comment on `Abs.hs` how to factor out a type of `Position` and types for AST trees with position information. - Can we get `Position` instead of `Maybe Position`? I see the `Nothing` comes from the empty list case, but maybe there we could return the position of the following token?
This has been implemented by @Commelina in #327 and will be released with 2.9.1. |
Functor as well as Foldable and Traversable can be derived using the proper LANGUAGE extensions.
Also: parameterized version of AST is primed, unprimed is position-carrying: type Exp = Exp' BNFC'Position data Exp' a = EPlus (Exp' a) (Exp' a) ...
Also abbreviate 'Either String' to 'Err' in generated happy file.
If one runs `bnfc --functor` on two grammars, one does not want two different (yet isomorophic) definitions of the position type (generativity problem). As type synonyms expand to their definition, it is better to use Maybe(Int,Int) as position type, and only define a synonym.
Released in 2.9.1. |
According to
bnfc --help
:--functor Make the AST a functor and use it to store the position of the nodes
How would you go about storing the position of the nodes?
The type that
yields is:
I.e: no position information is added... So does that mean that you have to use something else than
happy
, or can you get it to add that position information for you?The text was updated successfully, but these errors were encountered: