Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2993236.2993244acmconferencesArticle/Chapter ViewAbstractPublication PagesgpceConference Proceedingsconference-collections
research-article

Synthesizing regular expressions from examples for introductory automata assignments

Published: 20 October 2016 Publication History

Abstract

We present a method for synthesizing regular expressions for introductory automata assignments. Given a set of positive and negative examples, the method automatically synthesizes the simplest possible regular expression that accepts all the positive examples while rejecting all the negative examples. The key novelty is the search-based synthesis algorithm that leverages ideas from over- and under-approximations to effectively prune out a large search space. We have implemented our technique in a tool and evaluated it with non-trivial benchmark problems that students often struggle with. The results show that our system can synthesize desired regular expressions in 6.7 seconds on the average, so that it can be interactively used by students to enhance their understanding of regular expressions.

References

[1]
Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive program synthesis. In Proceedings of the 25th International Conference on Computer Aided Verification - Volume 8044, CAV 2013, pages 934–950, New York, NY, USA, 2013. Springer-Verlag New York, Inc.
[2]
Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Comput., 75(2):87–106, November 1987.
[3]
Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin Zorn. Flashrelate: Extracting relational data from semistructured spreadsheets using examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15, pages 218–228, New York, NY, USA, 2015. ACM.
[4]
David F. Barrero, Mar´ıa D. R-Moreno, and David Camacho. Adapting searchy to extract data using evolved wrappers. Expert Syst. Appl., 39(3):3061–3070, February 2012.
[5]
Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Marco Mauri, Eric Medvet, and Enrico Sorio. Automatic generation of regular expressions from examples with genetic programming. In Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’12, pages 1477–1478, New York, NY, USA, 2012. ACM.
[6]
Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Eric Medvet, and Enrico Sorio. Automatic synthesis of regular expressions from examples. Computer, 47(12):72–80, December 2014.
[7]
Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. Playing regex golf with genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO ’14, pages 1063–1070, New York, NY, USA, 2014. ACM.
[8]
Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. Inference of regular expressions for text extraction from examples. IEEE Trans. on Knowl. and Data Eng., 28(5):1217–1230, May 2016.
[9]
Josh Bongard and Hod Lipson. Active coevolutionary learning of deterministic finite automata. J. Mach. Learn. Res., 6:1651–1678, December 2005.
[10]
Falk Brauer, Robert Rieger, Adrian Mocan, and Wojciech M. Barczynski. Enabling information extraction by inference of regular expressions from sample entities. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pages 1285–1294, New York, NY, USA, 2011. ACM.
[11]
Duy Duc An Bui and Qing Zeng-Treitler. Learning regular expressions for clinical text classification. Journal of the American Medical Informatics Association, 21(5):850–857, 2014.
[12]
Ahmet Cetinkaya. Regular expression generation through grammatical evolution. In Proceedings of the 9th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’07, pages 2643–2646, New York, NY, USA, 2007. ACM.
[13]
B. D. Dunay, F. E. Petry, and B. P. Buckles. Regular language induction with genetic programming. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages 396–400 vol.1, Jun 1994.
[14]
Henning Fernau. Algorithms for learning regular expressions from positive data. Inf. Comput., 207(4):521–541, April 2009.
[15]
John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data structure transformations from input-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15, pages 229–239, New York, NY, USA, 2015. ACM.
[16]
Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’11, pages 317–330, New York, NY, USA, 2011. ACM.
[17]
John E. Hopcroft. Introduction to Automata Theory, Languages, and Computation. Pearson Addison Wesley, 3rd edition, 2007.
[18]
Efim Kinber. Learning regular expressions from representative examples and membership queries. In Proceedings of the 10th International Colloquium Conference on Grammatical Inference: Theoretical Results and Applications, ICGI’10, pages 94–108, Berlin, Heidelberg, 2010. Springer-Verlag.
[19]
John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992.
[20]
Peter Linz. An Introduction to Formal Language and Automata. Jones and Bartlett Publishers, Inc., USA, 2006.
[21]
A. R. Meyer and L. J. Stockmeyer. The equivalence problem for regular expressions with squaring requires exponential space. In Proceedings of the 13th Annual Symposium on Switching and Automata Theory (Swat 1972), SWAT ’72, pages 125–129, Washington, DC, USA, 1972. IEEE Computer Society.
[22]
Rajesh Parekh and Vasant Honavar. An incremental interactive algorithm for regular grammar inference. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2, AAAI’96, pages 1397–1397. AAAI Press, 1996.
[23]
Rajesh Parekh and Vasant G. Honavar. Learning dfa from simple examples. Mach. Learn., 44(1-2):9–35, July 2001.
[24]
Paul Prasse, Christoph Sawade, Niels Landwehr, and Tobias Scheffer. Learning to identify concise regular expressions that describe email campaigns. J. Mach. Learn. Res., 16(1):3687– 3720, January 2015.
[25]
Rishabh Singh and Sumit Gulwani. Synthesizing number transformations from input-output examples. In Proceedings of the 24th International Conference on Computer Aided Verification, CAV’12, pages 634–651, Berlin, Heidelberg, 2012. Springer-Verlag.
[26]
Michael Sipser. Introduction to the Theory of Computation. Cengage Learning, 3rd edition, 2012.
[27]
Borge Svingen. Learning regular languages using genetic programming. In John R. Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max H. Garzon, David E. Goldberg, Hitoshi Iba, and Rick Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 374–376, University of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann.
[28]
Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. Spamming botnets: Signatures and characteristics. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, SIGCOMM ’08, pages 171–182, New York, NY, USA, 2008. ACM.

Cited By

View all
  • (2024)Automating Pruning in Top-Down Enumeration for Program Synthesis Problems with Monotonic SemanticsProceedings of the ACM on Programming Languages10.1145/36897448:OOPSLA2(935-961)Online publication date: 8-Oct-2024
  • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
  • (2024)Structure and design of multimodal dataset for automatic regex synthesis methods in Roman UrduInternational Journal of Data Science and Analytics10.1007/s41060-024-00612-yOnline publication date: 23-Jul-2024
  • Show More Cited By

Index Terms

  1. Synthesizing regular expressions from examples for introductory automata assignments

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GPCE 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences
      October 2016
      212 pages
      ISBN:9781450344463
      DOI:10.1145/2993236
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 October 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Regular expression
      2. program synthesis
      3. programming by example

      Qualifiers

      • Research-article

      Conference

      GPCE '16
      Sponsor:
      GPCE '16: Generative Programming: Concepts and Experiences
      October 31 - November 1, 2016
      Amsterdam, Netherlands

      Acceptance Rates

      Overall Acceptance Rate 56 of 180 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)61
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 27 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Automating Pruning in Top-Down Enumeration for Program Synthesis Problems with Monotonic SemanticsProceedings of the ACM on Programming Languages10.1145/36897448:OOPSLA2(935-961)Online publication date: 8-Oct-2024
      • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
      • (2024)Structure and design of multimodal dataset for automatic regex synthesis methods in Roman UrduInternational Journal of Data Science and Analytics10.1007/s41060-024-00612-yOnline publication date: 23-Jul-2024
      • (2024)Automatic regex synthesis methods for english: a comparative analysisKnowledge and Information Systems10.1007/s10115-024-02232-1Online publication date: 3-Oct-2024
      • (2024)Enhancing Multi-modal Regular Expression Synthesis via Large Language Models and Semantic Manipulations of Sub-expressionsDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-96-0602-3_7(122-141)Online publication date: 25-Nov-2024
      • (2023)Saggitarius: A DSL for Specifying Grammatical DomainsProceedings of the ACM on Programming Languages10.1145/36228697:OOPSLA2(2023-2051)Online publication date: 16-Oct-2023
      • (2023)Data Extraction via Semantic Regular Expression SynthesisProceedings of the ACM on Programming Languages10.1145/36228637:OOPSLA2(1848-1877)Online publication date: 16-Oct-2023
      • (2023)Inductive Program Synthesis Guided by Observational Program SimilarityProceedings of the ACM on Programming Languages10.1145/36228307:OOPSLA2(912-940)Online publication date: 16-Oct-2023
      • (2023)Search-Based Regular Expression Inference on a GPUProceedings of the ACM on Programming Languages10.1145/35912747:PLDI(1317-1339)Online publication date: 6-Jun-2023
      • (2023)ImageEye: Batch Image Processing using Program SynthesisProceedings of the ACM on Programming Languages10.1145/35912487:PLDI(686-711)Online publication date: 6-Jun-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media