Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3276604.3276615acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Morbig: a static parser for POSIX shell

Published: 24 October 2018 Publication History

Abstract

The POSIX shell language defies conventional wisdom of compiler construction on several levels: The shell language was not designed for static parsing, but with an intertwining of syntactic analysis and execution by expansion in mind. Token recognition cannot be specified by regular expressions, lexical analysis depends on the parsing context and the evaluation context, and the shell grammar given in the specification is ambiguous. Besides, the unorthodox design choices of the shell language fit badly in the usual specification languages used to describe other programming languages. This makes the standard usage of LEX and YACC as a pipeline inadequate for the implementation of a parser for POSIX shell.
The existing implementations of shell parsers are complex and use low-level character-level parsing code which is difficult to relate to the POSIX specification. We find it hard to trust such parsers, especially when using them for writing automatic verification tools for shell scripts.
This paper offers an overview of the technical difficulties related to the syntactic analysis of the POSIX shell language. It also describes how we have resolved these difficulties using advanced parsing techniques (namely speculative parsing, parser state introspection, context-dependent lexical analysis and longest-prefix parsing) while keeping the implementation at a sufficiently high level of abstraction so that experts can check that the POSIX standard is respected.
The resulting tool, called MORBIG, is an open-source static parser for a well-defined and realistic subset of the POSIX shell language. Its implementation crucially relies on the purity and incrementality of LR(1) parsers generated by MENHIR, a parser generator for OCaml.

References

[1]
A. Afroozeh and A. Izmaylova. Iguana: a practical data-dependent parsing framework. In Proceedings of the 25th International Conference on Compiler Construction, pages 267-268. ACM, 2016.
[2]
A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2006.
[3]
J. Aycock and R. N. Horspool. Schrödinger's token. Softw., Pract. Exper., 31:803-814, 2001.
[4]
J. Aycock and R. N. Horspool. Practical earley parsing. The Computer Journal, 45(6):620-630, 2002.
[5]
R. Braakman, J. Rodin, J. Gilbey, and M. Hobley. checkbashisms. https://sourceforge.net/projects/checkbaskisms/, Nov. 2015.
[6]
R. Di Cosmo, D. Di Ruscio, P. Pelliccione, A. Pierantonio, and S. Zacchiroli. Supporting software evolution in component-based FOSS systems. Science of Computer Programming, 76(12):1144-1160, 2011.
[7]
R. Di Cosmo and S. Zacchiroli. Software heritage: Why and how to preserve software source code. In iPRES 2017: 14th International Conference on Digital Preservation, 2017.
[8]
D. Di Ruscio, P. Pelliccione, A. Pierantonio, and S. Zacchiroli. Towards maintainer script modernization in FOSS distributions. In IWOCE 2009: International Workshop on Open Component Ecosystem, pages 11-20. ACM, 2009.
[9]
J. Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94-102, 1970.
[10]
E. Gamma. Design patterns: elements of reusable object-oriented software. Pearson Education India, 1995.
[11]
M. Greenberg. Understanding the POSIX shell as a programming language. In Off the Beaten Track 2017, Paris, France, Jan. 2017.
[12]
V. Holen. shellcheck. https://github.com/koalaman/shellcheck, 2015.
[13]
IEEE and The Open Group. The open group base specifications issue 7. http://www.unix.org/version3/online.html, 2016.
[14]
IEEE and The Open Group. The open group base specifications issue 7. http://pubs.opengroup.org/onlinepubs/9699919799/, 2018.
[15]
A. Izmaylova, A. Afroozeh, and T. v. d. Storm. Practical, general parser combinators. In Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, pages 1-12. ACM, 2016.
[16]
D. E. Knuth. On the translation of languages from left to right. Information and control, 8(6):607-639, 1965.
[17]
K. Mazurak and S. Zdancewic. ABASH: finding bugs in bash scripts. In PLAS07: Proceedings of the 2007 workshop on Programming languages and analysis for security, pages 105-114, San Diego, CA, USA, June 2007.
[18]
D. Pager. A practical general method for constructing LR (k) parsers. Acta Informatica, 7(3):249-268, 1977.
[19]
F. Pottier. Reachability and error diagnosis in LR(1) parsers. In Proceedings of the 25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, March 12-18, 2016, pages 88-98, 2016.
[20]
F. Pottier. Visitors. http://gallium.inria.fr/~fpottier/visitors/manual.pdf, 2017.
[21]
F. Pottier and Y. Régis-Gianas. The Menhir parser generator. See: http://gallium.inria.fr/fpottier/menhir.
[22]
E. Scott and A. Johnstone. GLL parsing. Electronic Notes in Theoretical Computer Science, 253(7):177-189, 2010.
[23]
P. Stansifer and M. Wand. Parsing reflective grammars. In Proceedings of the Eleventh Workshop on Language Descriptions, Tools and Applications, page 10. ACM, 2011.
[24]
M. Tomita and S.-K. Ng. The generalized LR parsing algorithm. In Generalized LR parsing, pages 1-16. Springer, 1991.
[25]
M. G. van den Brand, A. van Deursen, J. Heering, H. De Jong, M. de Jonge, T. Kuipers, P. Klint, L. Moonen, P. A. Olivier, J. Scheerder, et al. The asf+sdf meta-environment: A component-based language development environment. In International Conference on Compiler Construction, pages 365-370. Springer, 2001.
[26]
E. Visser et al. Scannerless generalized-LR parsing. Universiteit van Amsterdam. Programming Research Group, 1997.

Cited By

View all
  • (2020)Analysing installation scenarios of Debian packagesTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-030-45237-7_14(235-253)Online publication date: 17-Apr-2020
  • (2019)Executable formal semantics for the POSIX shellProceedings of the ACM on Programming Languages10.1145/33711114:POPL(1-30)Online publication date: 20-Dec-2019
  • (2019)Root cause localization for unreproducible builds via causality analysis over system call tracingProceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE.2019.00056(527-538)Online publication date: 10-Nov-2019

Index Terms

  1. Morbig: a static parser for POSIX shell

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SLE 2018: Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering
    October 2018
    219 pages
    ISBN:9781450360296
    DOI:10.1145/3276604
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. POSIX shell
    2. Parsing
    3. functional programming

    Qualifiers

    • Research-article

    Conference

    SLE '18
    Sponsor:

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Analysing installation scenarios of Debian packagesTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-030-45237-7_14(235-253)Online publication date: 17-Apr-2020
    • (2019)Executable formal semantics for the POSIX shellProceedings of the ACM on Programming Languages10.1145/33711114:POPL(1-30)Online publication date: 20-Dec-2019
    • (2019)Root cause localization for unreproducible builds via causality analysis over system call tracingProceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE.2019.00056(527-538)Online publication date: 10-Nov-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media