Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #176: Store node position in the AST (Haskell backend) #327

Closed
wants to merge 2 commits into from

Conversation

Commelina
Copy link
Contributor

Most of the code is taken and reorganized from the unmerged branch https://github.com/BNFC/bnfc/compare/176-source-position. I tested it on all three token types with/without functor. But I am not sure if it overrides some new features when generating Happy file.

@andreasabel
Copy link
Member

Great, @Commelina to work on this dearly needed feature!

I build your PR and ran the bnfc-system-tests (directory testing).

But I am not sure if it overrides some new features when generating Happy file.

Unfortunately, it seems to. Please have a look at the following tests:

Regression tests:

  • 266_define
  • 235_SymbolsOverlapTokens
  • 70_WhiteSpaceSeparator
  • 202_comments
  • 204_InternalToken

Examples:

  • Alfa
  • cubicaltt

Hint: in testing/src/ParameterizedTests.hs, move the test parameters for --haskell --functor first in the list.

, hsParams { tpName = "Haskell (with functor)"
, tpBnfcOptions = ["--haskell", "--functor"] }

If you then make in the testing directory, the relevant tests run first (the whole testsuite takes 1 hour).

@andreasabel andreasabel added Haskell position Concerning position information in parsed AST labels Dec 8, 2020
@Commelina
Copy link
Contributor Author

Hi, I have rewritten the module based on the latest code from the master branch instead of modifying branch 176-source-position. I think that it is much clearer than the unmerged branch, and minimizes the occurrence of overriding new features. Only CFtoHappy.hs is modified instead of adding extra modules in this version.

Then I ran the bnfc-system-tests (in fact the part related to Haskell) and compared the output with the one from the latest unmodified code. The result is exactly the same, which means if the unmodified version passes a test, the modified one also passes and vise versa (in fact there does exist the situation that the unmodified code fails to pass some of the tests).

Also, I tested it on my own projects and it works well.

Copy link
Member
@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Commelina, this version is very good!

I noticed some problems with String literals (testcase 235_...), see below.

Further, please avoid to introduce code duplication. (The codebase of BNFC unfortunately suffers a lot from code duplication. My goal is to get rid of duplication as much as possible.) Maybe you can even find opportunities to reduce duplication in CFtoHappy in the places you modify.

[] -> "Nothing"
(Left _:_) -> "fst $1"
(Right _:_) -> "Just (tokenLineCol $1)"
getFunctorValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFunctorPos duplicates code of action. Please reformulate this hunk so it does not duplicate code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it refers to getFunctorValue...? But indeed, the code needs refactoring.

After refactoring, the hunk contains three core functions actionPos, actionValue and action. And now actionValue can handle both situations with/without functor so there is less duplication. actionPos gets position information and action just composes them in different forms depends on functor:

    action
      | functor   = let pos = actionPos
                        val = actionValue
                     in "(" ++ pos ++ ", " ++ val ++ ")"
      | otherwise = actionValue
    actionPos = case rhs of
      []          -> "Nothing"
      (Left _:_)  -> "fst $1"
      (Right _:_) -> "Just (tokenLineCol $1)"
    actionValue
      | isCoercion fun = unwords metavars
      | isNilCons  fun = unwords (qualify fun : metavars)
      | functor        = unwords (qualify fun : ("(" ++ actionPos ++ ")") : metavars)
      | otherwise      = unwords (qualify fun : metavars)

mkPosPart = if functor then "Just (tokenLineCol $1)" else ""
mkValPart tokenCat =
case tokenCat of
"String" -> if functor then stringUnpack "(tokenText $1)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testcase 235_SymbolsOverlapTokens is broken: String literals now arrive quoted in the abstract syntax. This is due to the use of tokenText here, I suppose. Note that the generated tokenText for String tokens uses the show function, which adds a level of quotation.

Copy link
Contributor Author
@Commelina Commelina Dec 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my mistake. Indeed, the show function in tokenTest adds an extra level of quotation to String literals.

The simplest and clearest way I can find out is to replace the tokenText function with a partial one (\(PT _ (TL s)) -> s):

"String"  -> if functor then stringUnpack "((\\(PT _ (TL s)) -> s) $1)" 
                        else stringUnpack "$1"

Although the function is partial, there will never be a Tok other than TL s when "String" is matched. So it is safe, I think.

(And I finally find out why my testing gave me unexpected results because I forgot to replace bnfc with the latest build :( )

(Left _:_) -> "fst $1"
(Right _:_) -> "Just (tokenLineCol $1)"
getFunctorValue
| isCoercion fun = unwords metavars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates some code of action. Please refactor so that no code is duplicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried resolving this problem in the first conversation (#327 (comment)).

@andreasabel
Copy link
Member

@Commelina thanks for the fixes!

The testsuite passes now.

We can now look at the user experience (UX). I tried --functor on a tiny grammar for Hutton's razor:

EPlus. Exp  ::= Exp "+" Exp1;
EInt.  Exp1 ::= Integer;

Then I looked at the products to discover how to use the new feature.

  • Test.hs is unchanged w.r.t. the version without --functor. This is good. Yet it did not give me any hints how to use positions.
  • Abs.hs places an additional parametric value at each AST constructor
    data Exp a = EInt a Integer | EPlus a (Exp a) (Exp a)
      deriving (C.Eq, C.Ord, C.Show, C.Read)
    
    instance C.Functor Exp where
    I think the Functor instance can and should now be automatically generated by GHC using the DeriveFunctor extension. Also, Foldable and Traversable can be derived.
    The file Abs.hs should give me some clues about the positions that the parser delivers. I'd favor something along
    data Position = Position { startLine :: Int, startColumn :: Int }
    data Exp' a = EInt a Integer | EPlus a (Exp' a) (Exp' a) ...
    type Exp = Exp' Position
  • Par.y:
    %name pExp_internal Exp
    %name pExp1_internal Exp1
    ...
    %%
    
    Integer :: { (Maybe (Int, Int), Integer) }
    Integer  : L_integ  { (Just (tokenLineCol $1), (read ((tokenText $1))) :: Integer) }
    
    Exp :: { (Maybe (Int, Int),  (Razor.Abs.Exp (Maybe (Int, Int))) ) }
    Exp : Exp '+' Exp1 { (fst $1, Razor.Abs.EPlus (fst $1) (snd $1) (snd $3)) }
    
    Exp1 :: { (Maybe (Int, Int),  Razor.Abs.Exp (Maybe (Int, Int)) ) }
    Exp1 : Integer { (fst $1, Razor.Abs.EInt (fst $1) (snd $1)) }
    {
    ...
    myLexer = tokens
    pExp = (>>= return . snd) . pExp_internal
    pExp1 = (>>= return . snd) . pExp1_internal
    }

Issues with the generated parser:

  • Concerning the last lines: The simpler code is
    pExp = fmap snd . pExp_internal
  • pExp etc. should have a type signature
  • There should be a comment to point out the entrypoints at the end of the file (for better UX).
  • (Minor issue, not sure: pExp_internal: better name scheme would be pExpWithPosition etc.)
  • The Maybe (Int, Int) shouldn't be duplicated everywhere. See comment on Abs.hs how to factor out a type of Position and types for AST trees with position information.
  • Can we get Position instead of Maybe Position? I see the Nothing comes from the empty list case, but maybe there we could return the position of the following token?

From a long term maintenance perspective, making the right choices concerning the user interface to the new --functor feature is important, as these will have to stay fixed for reasons of backwards compatibility. This is why we should think carefully. Maybe this feature should go into 2.9.1 rather than 2.9.0 to give it some time to mature.

I see the status of this PR as "can be merged", but further work is needed to smoothen the UX.
What are your plans? Do you want to continue here to also engineer the UX aspects, or do you want this PR merged and be done with it?

@Commelina
Copy link
Contributor Author

Thanks for your detailed reply!

The items you listed are worthy of improvement, and I'd like to continue here. However, I am afraid that I do not have enough time in the following few days (maybe it will be better after a few weeks). In my personal opinion, to avoid conflicts in the future, I'd like to merge it now and improve it later. I will improve them as soon as I have time.

@andreasabel
Copy link
Member

Ok, I will try to release 2.9.0 soon, then merge this PR, then look at UX design.

andreasabel pushed a commit that referenced this pull request Jan 12, 2021
This is PR #327.

Effects and TODO (Andreas):
#327 (comment)

We can now look at the user experience (UX).  I tried `--functor` on a tiny grammar for Hutton's razor:
```
EPlus. Exp  ::= Exp "+" Exp1;
EInt.  Exp1 ::= Integer;
```
Then I looked at the products to discover how to use the new feature.

- `Test.hs` is unchanged w.r.t. the version without `--functor`.  This is good.  Yet it did not give me any hints how to use positions.

- `Abs.hs` places an additional parametric value at each AST constructor
  ```haskell
  data Exp a = EInt a Integer | EPlus a (Exp a) (Exp a)
    deriving (C.Eq, C.Ord, C.Show, C.Read)

  instance C.Functor Exp where
  ```
  I think the `Functor` instance can and should now be automatically generated by GHC using the `DeriveFunctor` extension. Also, `Foldable` and `Traversable` can be derived.
  The file `Abs.hs` should give me some clues about the positions that the parser delivers.  I'd favor something along
  ```haskell
  data Position = Position { startLine :: Int, startColumn :: Int }
  data Exp' a = EInt a Integer | EPlus a (Exp' a) (Exp' a) ...
  type Exp = Exp' Position
  ```

- `Par.y`:
  ```haskell
  %name pExp_internal Exp
  %name pExp1_internal Exp1
  ...
  %%

  Integer :: { (Maybe (Int, Int), Integer) }
  Integer  : L_integ  { (Just (tokenLineCol $1), (read ((tokenText $1))) :: Integer) }

  Exp :: { (Maybe (Int, Int),  (Razor.Abs.Exp (Maybe (Int, Int))) ) }
  Exp : Exp '+' Exp1 { (fst $1, Razor.Abs.EPlus (fst $1) (snd $1) (snd $3)) }

  Exp1 :: { (Maybe (Int, Int),  Razor.Abs.Exp (Maybe (Int, Int)) ) }
  Exp1 : Integer { (fst $1, Razor.Abs.EInt (fst $1) (snd $1)) }
  {
  ...
  myLexer = tokens
  pExp = (>>= return . snd) . pExp_internal
  pExp1 = (>>= return . snd) . pExp1_internal
  }
  ```

Issues with the generated parser:

- Concerning the last lines: The simpler code is
  ```haskell
  pExp = fmap snd . pExp_internal
  ```

- `pExp` etc. should have a type signature

- There should be a comment to point out the entrypoints at the end of the file (for better UX).

- (Minor issue, not sure: `pExp_internal`:  better name scheme would be `pExpWithPosition` etc.)

- The `Maybe (Int, Int)` shouldn't be duplicated everywhere.  See comment on `Abs.hs` how to factor out a type of `Position` and types for AST trees with position information.

- Can we get `Position` instead of `Maybe Position`?  I see the `Nothing` comes from the empty list case, but maybe there we could return the position of the following token?
@andreasabel
Copy link
Member

@Commelina : Your PR has been merged as commit 6d57125.
Thanks for your contribution!

@andreasabel
Copy link
Member
data Position = Position { startLine :: Int, startColumn :: Int }

I implemented something like this, but I have second thoughts because of generativity. If you have two grammars, BNFC will create two Position types. Albeit they are isomorphic, they are not the same to the compiler, and if you add some tools to manipulate positions, you need to coerce one of these types into the other. A simple type synonym

type Position = Maybe (Int, Int)

does not have these problems.

@andreasabel andreasabel added this to the 2.9.1 milestone Jan 21, 2021
@andreasabel andreasabel self-assigned this Jan 21, 2021
andreasabel added a commit that referenced this pull request Jan 21, 2021
Functor as well as Foldable and Traversable can be derived using the
proper LANGUAGE extensions.
andreasabel added a commit that referenced this pull request Jan 21, 2021
Also: parameterized version of AST is primed, unprimed is
position-carrying:

  type Exp = Exp' BNFC'Position
  data Exp' a = EPlus (Exp' a) (Exp' a) ...
andreasabel added a commit that referenced this pull request Jan 21, 2021
Also abbreviate 'Either String' to 'Err' in generated happy file.
andreasabel added a commit that referenced this pull request Jan 21, 2021
If one runs `bnfc --functor` on two grammars, one does not want two
different (yet isomorophic) definitions of the position
type (generativity problem).

As type synonyms expand to their definition, it is better to use
Maybe(Int,Int) as position type, and only define a synonym.
andreasabel added a commit that referenced this pull request Jan 22, 2021
whenever BNFC'Position is defined.

Fix-up to d90e876
andreasabel added a commit that referenced this pull request Jan 22, 2021
Input is not always 'String', can also be 'Text' or 'ByteString'.

Fix-up 7f21679.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Haskell position Concerning position information in parsed AST
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants