Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3314221.3314625acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

A typed, algebraic approach to parsing

Published: 08 June 2019 Publication History

Abstract

In this paper, we recall the definition of the context-free expressions (or µ-regular expressions), an algebraic presentation of the context-free languages. Then, we define a core type system for the context-free expressions which gives a compositional criterion for identifying those context-free expressions which can be parsed unambiguously by predictive algorithms in the style of recursive descent or LL(1).
Next, we show how these typed grammar expressions can be used to derive a parser combinator library which both guarantees linear-time parsing with no backtracking and single-token lookahead, and which respects the natural denotational semantics of context-free expressions. Finally, we show how to exploit the type information to write a staged version of this library, which produces dramatic increases in performance, even outperforming code generated by the standard parser generator tool ocamlyacc.

References

[1]
Mads Sig Ager, Dariusz Biernacki, Olivier Danvy, and Jan Midtgaard. 2003. From Interpreter to Compiler and Virtual Machine: a Functional Derivation. Technical Report BRICS RS-03-14. Aarhus, Denmark.
[2]
Robert Atkey, Sam Lindley, and Jeremy Yallop. 2009. Unembedding Domain-specific Languages. In Proceedings of the 2Nd ACM SIGPLAN Symposium on Haskell (Haskell '09). ACM, New York, NY, USA, 37-48.
[3]
Robert Atkey and Conor McBride. 2013. Productive Coprogramming with Guarded Recursion. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP '13). ACM, New York, NY, USA, 197-208.
[4]
H. Beki? and C. B. Jones (Eds.). 1984. Programming Languages and Their Definition: Selected Papers of H. Beki?. LNCS, Vol. 177. Springer-Verlag.
[5]
Anders Bondorf. 1992. Improving Binding Times Without Explicit CPS-conversion. In Proceedings of the 1992 ACM Conference on LISP and Functional Programming (LFP '92). ACM, New York, NY, USA, 1-10.
[6]
Anne Brüggemann-Klein and Derick Wood. 1992. Deterministic Regular Languages. In 9th Annual Symposium on Theoretical Aspects of Computer Science (STACS 92). 173-184.
[7]
Janusz A. Brzozowski. 1964. Derivatives of Regular Expressions. J. ACM 11, 4 (Oct. 1964), 481-494.
[8]
Olivier Danvy, Karoline Malmkjær, and Jens Palsberg. 1996. Eta-Expansion Does The Trick. ACM Trans. Program. Lang. Syst. 18, 6 (1996), 730-751.
[9]
Rowan Davies and Frank Pfenning. 1996. A Modal Analysis of Staged Computation. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '96). ACM, New York, NY, USA, 258-270.
[10]
Niels Bjørn Bugge Grathwohl, Fritz Henglein, and Dexter Kozen. 2014. Infinitary Axiomatization of the Equational Theory of Context-Free Languages. In Fixed Points in Computer Science (FICS 2013), Vol. 126. 44-55.
[11]
Dick Grune and Ceriel J.H. Jacobs. 2007. Parsing Techniques: A Practical Guide (2 ed.). Springer Science, New York, NY 10013, USA.
[12]
Christopher S. Hardin and Roshan P. James. 2013. Core_bench: Micro-Benchmarking for OCaml. OCaml Workshop.
[13]
Graham Hutton. 1992. Higher-Order Functions for Parsing. Journal of Functional Programming 2, 3 (001 007 1992), 323-343.
[14]
Jun Inoue. 2014. Supercompiling with Staging. In Fourth International Valentin Turchin Workshop on Metacomputation.
[15]
Clinton L. Jeffery. 2003. Generating LR Syntax Error Messages from Examples. ACM Trans. Program. Lang. Syst. 25, 5 (Sept. 2003), 631-640.
[16]
Adrian Johnstone and Elizabeth Scott. 1998. Generalised Recursive Descent parsing and Fellow-Determinism. In CC (Lecture Notes in Computer Science), Vol. 1383. Springer, 16-30.
[17]
Manohar Jonnalagedda, Thierry Coppey, Sandro Stucki, Tiark Rompf, and Martin Odersky. 2014. Staged Parser Combinators for Efficient Data Processing. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '14). ACM, New York, NY, USA, 637-653.
[18]
Oleg Kiselyov. 2014. The Design and Implementation of BER MetaOCaml. In Functional and Logic Programming, Michael Codish and Eijiro Sumii (Eds.). Springer International Publishing, Cham, 86-102.
[19]
Donald E Knuth. 1965. On the Translation of Languages from Left to Right. Information and Control 8, 6 (1965), 607-639.
[20]
Dexter Kozen. 1990. On Kleene Algebras and Closed Semirings. In International Symposium on Mathematical Foundations of Computer Science. Springer, 26-47.
[21]
Neelakantan R. Krishnaswami. 2013. Higher-Order functional Reactive Programming without Spacetime Leaks. In ACM SIGPLAN International Conference on Functional Programming, ICFP'13, Boston, MA, USA - September 25-27, 2013. 221-232.
[22]
Hans Leiß. 1991. Towards Kleene Algebra with Recursion. In Computer Science Logic (CSL). 242-256.
[23]
P. M. Lewis, II and R. E. Stearns. 1968. Syntax-Directed Transduction. J. ACM 15, 3 (July 1968), 465-488.
[24]
Conor McBride and Ross Paterson. 2008. Applicative programming with effects. J. Funct. Program. 18, 1 (2008), 1-13.
[25]
Kristian Nielsen and Morten Heine Sørensen. 1995. Call-By-Name CPS-Translation As a Binding-Time Improvement. In Proceedings of the Second International Symposium on Static Analysis (SAS '95). Springer-Verlag, London, UK, UK, 296-313. http://dl.acm.org/citation.cfm?id=647163.717677.
[26]
François Pottier and Yann Régis-Gianas. 2017. Menhir Reference Manual. INRIA. http://gallium.inria.fr/~fpottier/menhir/.
[27]
Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin J. Brown, Vojin Jovanovic, HyoukJoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. 2013. Optimizing Data Structures in High-Level Programs: New Directions for Extensible Compilers Based on Staging. In The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '13, Rome, Italy - January 23 - 25, 2013, Roberto Giacobazzi and Radhia Cousot (Eds.). ACM, 497-510.
[28]
Arto Salomaa. 1973. Formal Languages. Academic Press.
[29]
Michael Sperber and Peter Thiemann. 2000. Generation of LR Parsers by Partial Evaluation. ACM Trans. Program. Lang. Syst. 22, 2 (March 2000), 224-264.
[30]
S. Doaitse Swierstra and Luc Duponcheel. 1996. Deterministic, Error-Correcting Combinator Parsers. In Advanced Functional Programming, Second International School, Olympia, WA, USA, August 26-30, 1996, Tutorial Text. 184-207.
[31]
Walid Mohamed Taha. 1999. Multistage Programming: Its Theory and Applications. Ph.D. Dissertation. Oregon Graduate Institute of Science and Technology. AAI9949870.
[32]
Ken Thompson. 1968. Programming Techniques: Regular Expression Search Algorithm. Commun. ACM 11, 6 (June 1968), 419-422.
[33]
Joost Winter, Marcello M Bonsangue, and Jan Rutten. 2011. Context-Free Languages, Coalgebraically. In International Conference on Algebra and Coalgebra in Computer Science. Springer, 359-376.
[34]
Jeremy Yallop and Oleg Kiselyov. 2019. Generating Mutually Recursive Definitions. In Proceedings of the 2019 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM 2019). ACM, New York, NY, USA, 75-81.
[35]
Zoltán Ésik and Hans Leiß. 2005. Algebraically Complete Semirings and Greibach Normal Form. Annals of Pure and Applied Logic 133 (1-3) (2005), 173-203.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2019
1162 pages
ISBN:9781450367127
DOI:10.1145/3314221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Kleene algebra
  2. context-free languages
  3. parsing
  4. type theory

Qualifiers

  • Research-article

Conference

PLDI '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)9
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Paguroidea: Fused Parser Generator with Transparent Semantic ActionsProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641563(39-48)Online publication date: 17-Feb-2024
  • (2024)Program generation meets program verificationScience of Computer Programming10.1016/j.scico.2023.103035232:COnline publication date: 1-Jan-2024
  • (2024)MetaOCaml: Ten Years LaterFunctional and Logic Programming10.1007/978-981-97-2300-3_12(219-236)Online publication date: 15-May-2024
  • (2024)A Proof Theory of ($$\omega $$-)Context-Free Languages, via Non-wellfounded ProofsAutomated Reasoning10.1007/978-3-031-63501-4_13(237-256)Online publication date: 2-Jul-2024
  • (2023)flap: A Deterministic Parser with Fused LexingProceedings of the ACM on Programming Languages10.1145/35912697:PLDI(1194-1217)Online publication date: 6-Jun-2023
  • (2023)Type-based Termination Analysis for Parsing Expression GrammarsProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577620(1372-1379)Online publication date: 27-Mar-2023
  • (2022)A Type-Directed Algorithm to Generate Random Well-Formed Parsing Expression GrammarsProceedings of the XXVI Brazilian Symposium on Programming Languages10.1145/3561320.3561326(8-14)Online publication date: 6-Oct-2022
  • (2022)Oregano: staging regular expressions with Moore Cayley fusionProceedings of the 15th ACM SIGPLAN International Haskell Symposium10.1145/3546189.3549916(66-80)Online publication date: 6-Sep-2022
  • (2022)Staging with class: a specification for typed template HaskellProceedings of the ACM on Programming Languages10.1145/34987236:POPL(1-30)Online publication date: 12-Jan-2022
  • (2022)Unified Program Generation and Verification: A Case Study on Number-Theoretic TransformFunctional and Logic Programming10.1007/978-3-030-99461-7_8(133-151)Online publication date: 10-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media