Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3626246.3654746acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Demonstrating REmatch: A Novel RegEx Engine for Finding all Matches

Published: 09 June 2024 Publication History

Abstract

In this demonstration we showcase REmatch, a regular expression (RegEx) engine built to find all matches of a given pattern in a document. REmatch is based on the theory of enumeration algorithms, and it extends the capability of classical regex engines with the ability to find nested and overlapping matches with a simple and intuitive syntax, and no need to use non-standard operators, while at the same time maintaining efficient performance. The algorithmic core of REmatch is an algorithm that builds a compressed representation of all matching results and enumerates them on demand in time proportional to writing them down symbol by symbol. For this demonstration, we developed a simple Web interface to access the REmatch engine, available at https://rematch.cl, and will showcase the utility of the engine in the context of analysing DNA sequences, linguistic analysis, among others. For this, we have prepared a series of examples that illustrate the expressiveness and ease of use of REmatch. Additionally, at the demo we will have a series of challenge tasks, asking attendees to capture the result set of a simple REmatch expression using their favourite RegEx engine.

References

[1]
BLAST: Basic Local Alignment Search Tool 2022. https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html.
[2]
Russ Cox. 2007. Regular expression matching can be simple and fast (but is slow in java, perl, php, python, ruby,...). https://swtch.com/~rsc/regexp/regexp1.html.
[3]
Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. 2015. Document Spanners: A Formal Approach to Information Extraction. Journal of the ACM (JACM) 62, 2 (2015), 12:1--12:51.
[4]
Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, and Domagoj Vrgoc. 2020. Efficient Enumeration Algorithms for Regular Document Spanners. ACM Transactions on Database Systems (TODS) 45, 1 (2020), 3:1--3:42.
[5]
How to find overlapping matches with a regexp? 2012. https://stackoverflow.com/questions/11430863/how-to-find-overlapping-matches-with-a-regexp.
[6]
IEEE and The Open Group. 2018. https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html.
[7]
Perl 5.38.2 Documentation -- PerlRE 2024. https://perldoc.perl.org/perlre.
[8]
Regex101 2024. https://regex101.com.
[9]
Cristian Riveros, Nicolás Van Sint Jan, and Domagoj Vrgoc. 2023. REmatch: a novel regex engine for finding all matches. In VLDB, Vol. 16. 2792--2804.
[10]
Luc Segoufin. 2013. Enumerating with constant delay the answers to a query. In ICDT. 10--20.
[11]
Christian JA et. al. Sigrist. 2012. New and continuing developments at PROSITE. Nucleic acids research 41, D1 (2012), D344--D347.
[12]
Anubhava Srivastava. 2017. Java 9 Regular Expressions. Packt Publishing.
[13]
The REmatch Library 2024. https://github.com/REmatchChile.

Index Terms

  1. Demonstrating REmatch: A Novel RegEx Engine for Finding all Matches

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
      June 2024
      694 pages
      ISBN:9798400704222
      DOI:10.1145/3626246
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 June 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. REmatch
      2. RegEx
      3. information extraction
      4. regular expressions

      Qualifiers

      • Short-paper

      Funding Sources

      • ANID ? Millennium Science Initiative Program

      Conference

      SIGMOD/PODS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 41
        Total Downloads
      • Downloads (Last 12 months)41
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 19 Nov 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media